Version Explain:

In this version these will be performed:

  1. Top features selection based on trained models’ feature importance.

    This will depend on different number of CpGs selected and different features selection methods.

    The features selection methods mainly have two different purpose, one is for binary classification, another is multi-class classification.

  2. Top features selection based on trained models’ feature importance with different selection methods.

    There will have several selection methods, for example based on mean feature importance, median quantile feature importance and frequency / common feature importance.

    • The frequency / common feature importance is processed in the following:
      1. select the TOP Number of features (say 40) for each model
      2. calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
      3. For each features that appear greater than half time, we consider it’s important and collect these important features as common features.
  3. Output two data frames that will be used in Pareto optimal.

    One is filtered data frame with Top Number of features based on different method selection.

    The another one is the phenotype data frame.

  4. The section of evaluation for the output selected feature performance based on three methods are performed.

Input Session

This part is collection of input , change them as needed.

File Path :

csv_Ni1905FilePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\ADNI_covariate_withEpiage_1905obs.csv"

TopSelectedCpGs_filePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\Top5K_CpGs.csv"

Number of Top CpGs keeped:

# Number of Top CpGs keeped based on standard deviation
Number_N_TopNCpGs<-params$INPUT_Number_N_TopNCpGs

Session Input:

Session 1.6.1 Missing Value

# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"

Impute_NA_FLAG_NUM = 1

Session 1.6.2 Feature Selection

# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"
# if we want to use classification with CN vs AD, then let "METHOD_FEATURE_FLAG_NUM=4"
# if we want to use classification with CN vs MCI, then let "METHOD_FEATURE_FLAG_NUM=5"
# if we want to use classification with MCI vs AD, then let "METHOD_FEATURE_FLAG_NUM=6"

METHOD_FEATURE_FLAG_NUM = 3

Session 7.0 Important Features

# GOTO "INPUT" Session to set the Number of common features needed
# Generally this is for visualization

NUM_COMMON_FEATURES_SET = 20
NUM_COMMON_FEATURES_SET_Frequency = 20

Session 8.0 Feature Selection and Output

The feature selection method :

  1. based on mean feature importance ( set “INPUT_Method_Mean_Choose = TRUE” )
  2. based on median quantile feature importance ( set “INPUT_Method_Median_Choose = TRUE” )
  3. based on feature frequency importance. ( set “INPUT_Method_Frequency_Choose = TRUE” )
    • Comment: If use the feature frequency importance method, The Input number of features = N is used for the first step, select TOP N features for each model. In the end, may not exactly same as N features kept.
  4. Set Input method flag to FALSE will not generate the data based that method. If we want output all data based on each method, set all flag to TRUE. In summary, set the corresponding flag to TRUE, we will output the data set selected based on that corresponding method.
# This is the flag of phenotype data output, 
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".

phenoOutPUt_FLAG = TRUE
  

  
# For 8.0 Feature Selection and Output : 
# NUM_FEATURES <- INPUT_NUMBER_FEATURES
#   This is number of features needed
# Method_Selected_Choose <- INPUT_Method_Selected_Choose
#   This is the method performed for the Output stage feature selection method


INPUT_NUMBER_FEATURES = params$INPUT_OUT_NUMBER_FEATURES
INPUT_Method_Mean_Choose = TRUE
INPUT_Method_Median_Choose = TRUE
INPUT_Method_Frequency_Choose = TRUE



if(INPUT_Method_Mean_Choose|| INPUT_Method_Median_Choose || INPUT_Method_Frequency_Choose){
  OUTUT_file_directory<- "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method3_BinaryClass_CN_vs_CI\\Method3_BinaryClass_SelectedFeatures\\"
  OUTUT_CSV_PATHNAME <- paste(OUTUT_file_directory,"INPUT_",Number_N_TopNCpGs,"CpGs\\",sep="")
  
  if (dir.exists(OUTUT_CSV_PATHNAME)) {
    message("Directory already exists.")
    } else {
    dir.create(OUTUT_CSV_PATHNAME, recursive = TRUE)
    message("Directory created.")
    }
  
}
## Directory already exists.

Session 10.0 Perfomance Metrics

FLAG_WRITE_METRICS_DF is flag of whether to output the csv which contains the performance metrics.

# This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics

Metrics_Table_Output_FLAG = TRUE


FLAG_WRITE_METRICS_DF = TRUE



if(FLAG_WRITE_METRICS_DF){
  OUTUT_PerfMertics_directory<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method3_BinaryClass_CN_vs_CI\\Method3_BinaryClass_PerformanceMetrics\\"
  
  OUTUT_PerformanceMetricsCSV_PATHNAME <- paste(OUTUT_PerfMertics_directory,"INPUT_",Number_N_TopNCpGs,"CpGs_",INPUT_NUMBER_FEATURES,"SelFeature_PerMetrics.csv",sep="")
  
  if (dir.exists(OUTUT_PerfMertics_directory)) {
    message("Directory already exists.")
    } else {
    dir.create(OUTUT_PerfMertics_directory, recursive = TRUE)
    message("Directory created.")
    }
  print(OUTUT_PerformanceMetricsCSV_PATHNAME)
  
}
## Directory already exists.
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method3_BinaryClass_CN_vs_CI\\Method3_BinaryClass_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"

1. Preprocess

Packages and Libraries that may need to install and use.

# Function to check and install Bioconductor package: "limma"

install_bioc_packages <- function(packages) {
  if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
  }
  for (pkg in packages) {
    if (!requireNamespace(pkg, quietly = TRUE)) {
      BiocManager::install(pkg, dependencies = TRUE)
    } else {
      message(paste("Package", pkg, "is already installed."))
    }
  }
}


install_bioc_packages("limma")
## Package limma is already installed.
print("The required packages are all successfully installed.")
## [1] "The required packages are all successfully installed."
library(limma)

Set seed for reproduction.

set.seed(123)

1.1 Data Read and Preview

csv_NI1905<-read.csv(csv_Ni1905FilePath)
csv_NI1905_RAW <- csv_NI1905
TopSelectedCpGs<-read.csv(TopSelectedCpGs_filePath, check.names = FALSE)
TopSelectedCpGs_RAW <- TopSelectedCpGs

1.1.1 csv_NI1905 (“ADNI_covariate_withEpiage_1905obs.csv”)

head(csv_NI1905,n=3)
rownames(csv_NI1905)<-as.matrix(csv_NI1905[,"barcodes"])
dim(csv_NI1905)
## [1] 1905   23

1.1.2 TopSelectedCpGs

dim(TopSelectedCpGs)
## [1] 5000 1921
head(TopSelectedCpGs[,1:8])
rownames(TopSelectedCpGs)<-TopSelectedCpGs[,1]
head(rownames(TopSelectedCpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopSelectedCpGs))
## [1] "ProbeID"             "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01" "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopSelectedCpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01" "201046290111_R08C01" "sdDev"

1.1.3 “TopN_CpGs”

1.1.3.1 Select Top N CpGs

This part is used to adjust the CpGs needed to use, it will keep the top N CpGs based on standard deviation.

sorted_TopSelectedCpGs <- TopSelectedCpGs[order(-TopSelectedCpGs$sdDev), ]
TopN_CpGs <- head(sorted_TopSelectedCpGs,Number_N_TopNCpGs )
TopN_CpGs_RAW<-TopN_CpGs

Variable “TopN_CpGs” will be used for processing the data. Now let’s take a look at it.

1.1.3.2 Preview “TopN_CpGs”

dim(TopN_CpGs)
## [1] 5000 1921
rownames(TopN_CpGs)<-TopN_CpGs[,1]
head(rownames(TopN_CpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopN_CpGs))
## [1] "ProbeID"             "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01" "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopN_CpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01" "201046290111_R08C01" "sdDev"

1.2 Check Duplicates

Now, let’s check with duplicate of Sample ID (“barcodes”):

Start with people who don’t have the unique ID (“uniqueID = 0”):

library(dplyr)
dim(csv_NI1905[csv_NI1905$uniqueID == 0, ])
## [1] 1256   23
dim(csv_NI1905[csv_NI1905$uniqueID == 1, ])
## [1] 649  23
duplicates <-  csv_NI1905[csv_NI1905$uniqueID == 0, ] %>%
  group_by(barcodes) %>%
  filter(n() > 1) %>%
  ungroup()

print(dim(duplicates))
## [1]  0 23
rm(duplicates)

Based on the output of dimension , they have the different Sample ID (“barcodes”).

Then check with all records, whether they have duplicated Sample ID (“barcodes”).

duplicates <-  csv_NI1905 %>%
  group_by(barcodes) %>%
  filter(n() > 1) %>%
  ungroup()
print(dim(duplicates))
## [1]  0 23

From the above output, we can see the Sample ID (“barcodes”) are unique.

names(csv_NI1905)
##  [1] "barcodes"    "RID.a"       "prop.B"      "prop.NK"     "prop.CD4T"   "prop.CD8T"   "prop.Mono"   "prop.Neutro" "prop.Eosino" "DX"          "age.now"     "PTGENDER"    "ABETA"       "TAU"        
## [15] "PTAU"        "PC1"         "PC2"         "PC3"         "ageGroup"    "ageGroupsq"  "DX_num"      "uniqueID"    "Horvath"

There might have the situation that the same person with different timeline. So we only keep the data with who has the unique ID, “unique ID =1”

csv_NI1905<-csv_NI1905[csv_NI1905$uniqueID == 1, ]
dim(csv_NI1905)
## [1] 649  23

1.3 Remove NA values

Since “DX” will be response variable, we first remove all rows with NA value in “DX” column

# "DX" will be Y,remove all rows with NA value in "DX" column
csv_NI1905<-csv_NI1905 %>% filter(!is.na(DX)) 

1.4 Sample Name filtering

We only keep with the samples which appears in both datasets.

Matrix_sample_names_NI1905 <- as.matrix(csv_NI1905[,"barcodes"])
Matrix_sample_names_TopN_CpGs <- as.matrix(colnames(TopN_CpGs))
common_sample_names<-intersect(Matrix_sample_names_NI1905,Matrix_sample_names_TopN_CpGs)
csv_NI1905 <- csv_NI1905 %>% filter(barcodes %in% common_sample_names)
TopN_CpGs <- TopN_CpGs[, common_sample_names, drop = FALSE]
head(TopN_CpGs[,1:3],n=2)
dim(TopN_CpGs)
## [1] 5000  648
dim(csv_NI1905)
## [1] 648  23

1.5 Merged DataFrame

1.5.1 Merge two datasets

Merge these two datasets and tored into “merged_df”

trans_TopN_CpGs<-t(TopN_CpGs)

# Check the total length of the rownames
# Recall that the sample name have been matched and both of them don't have duplicates
# Now, order the rownames and bind them together. This can make sure that the merged data frame created by these two data frame correctly matched together.

trans_TopN_CpGs_ordered<-trans_TopN_CpGs[order(rownames(trans_TopN_CpGs)),]
csv_NI1905_ordered<-csv_NI1905[order(rownames(csv_NI1905)),]
print("The rownames matchs in order:")
## [1] "The rownames matchs in order:"
check_1 = length(rownames(csv_NI1905_ordered))
check_2 = sum(rownames(csv_NI1905_ordered)==rownames(trans_TopN_CpGs_ordered))
print(check_1==check_2)
## [1] TRUE
merged_df_raw<-cbind(trans_TopN_CpGs_ordered,csv_NI1905_ordered)
phenotic_features_RAW<-colnames(csv_NI1905)
print(phenotic_features_RAW)
##  [1] "barcodes"    "RID.a"       "prop.B"      "prop.NK"     "prop.CD4T"   "prop.CD8T"   "prop.Mono"   "prop.Neutro" "prop.Eosino" "DX"          "age.now"     "PTGENDER"    "ABETA"       "TAU"        
## [15] "PTAU"        "PC1"         "PC2"         "PC3"         "ageGroup"    "ageGroupsq"  "DX_num"      "uniqueID"    "Horvath"
phenoticPart_RAW <- merged_df_raw[,phenotic_features_RAW]
dim(phenoticPart_RAW)
## [1] 648  23
head(phenoticPart_RAW)
head(merged_df_raw[,1:3])
merged_df<-merged_df_raw

1.5.2 “merged_df”

head(colnames(merged_df))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"

1.5.3 Feature Names

(1) CpGs (beta values)

The name of feature CpGs could be called by: “featureName_CpGs”

featureName_CpGs<-rownames(TopN_CpGs)
length(featureName_CpGs)
## [1] 5000
head(featureName_CpGs)
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"

1.6 Clean Merged datasets

clean_merged_df<-merged_df

1.6.1 Missing Value

missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## ABETA   TAU  PTAU 
##   109   109   109

Choose Output Data

Choose the method we want the data apply. The output dataset name is “clean_merged_df”.

# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"

Impute_NA_FLAG = Impute_NA_FLAG_NUM

(1) Impute with Mean

if (Impute_NA_FLAG == 1){
  clean_merged_df_imputed_mean<-clean_merged_df

  mean_ABETA_rmNA <- mean(clean_merged_df$ABETA, na.rm = TRUE)
  clean_merged_df_imputed_mean$ABETA[
    is.na(clean_merged_df_imputed_mean$ABETA)] <- mean_ABETA_rmNA

  mean_TAU_rmNA <- mean(clean_merged_df$TAU, na.rm = TRUE)
  clean_merged_df_imputed_mean$TAU[
    is.na(clean_merged_df_imputed_mean$TAU)] <- mean_TAU_rmNA

  mean_PTAU_rmNA <- mean(clean_merged_df$PTAU, na.rm = TRUE)
  clean_merged_df_imputed_mean$PTAU[
    is.na(clean_merged_df_imputed_mean$PTAU)] <- mean_PTAU_rmNA
  
  clean_merged_df = clean_merged_df_imputed_mean 
}

(2) Impute with KNN

library(VIM)
if (Impute_NA_FLAG == 2){
  df_imputed_KNN <- kNN(merged_df, k = 5)
  imputed_summary <- colSums(df_imputed_KNN[, grep("_imp", names(df_imputed_KNN))])
  print(imputed_summary[imputed_summary > 0])
  clean_merged_df<-df_imputed_KNN[, -grep("_imp", names(df_imputed_KNN))]
}

Check the missing value problem solved

missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## named numeric(0)

1.6.2 Feature Selection

Choose Method Use

Choose the method we want to use

# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"

METHOD_FEATURE_FLAG = METHOD_FEATURE_FLAG_NUM

(1) Method One

if (METHOD_FEATURE_FLAG ==  1){
  df_fs_method1 <- clean_merged_df
}
Picking Features
if(METHOD_FEATURE_FLAG ==  1){
  
  phenotic_features_m1<-c("DX","age.now","PTGENDER",
                          "PC1","PC2","PC3")
  pickedFeatureName_m1<-c(phenotic_features_m1,featureName_CpGs)
  df_fs_method1<-clean_merged_df[,pickedFeatureName_m1]
  df_fs_method1$DX<-as.factor(df_fs_method1$DX)
  df_fs_method1$PTGENDER<-as.factor(df_fs_method1$PTGENDER)
  head(df_fs_method1[,1:5],n=3)
  dim(df_fs_method1)
}
if(METHOD_FEATURE_FLAG ==  1){
  dim(df_fs_method1)
}
Perform DMP - Use LIMMA

Create contrast matrix for comparing CN vs Dementia vs MCI

if(METHOD_FEATURE_FLAG == 1){

  pheno_data_m1 <- df_fs_method1[,phenotic_features_m1] 
  head(pheno_data_m1[,1:5],n=3)
  
  pheno_data_m1$DX <- factor(pheno_data_m1$DX, levels = c("CN", "MCI", "Dementia"))
  design_m1 <- model.matrix(~ 0 + DX + age.now + PTGENDER + PC1 + PC2 + PC3,
                         data = pheno_data_m1)

  colnames(design_m1)[colnames(design_m1) == "DXCN"] <- "CN"
  colnames(design_m1)[colnames(design_m1) == "DXDementia"] <- "Dementia"
  colnames(design_m1)[colnames(design_m1) == "DXMCI"] <- "MCI"

  head(design_m1)
  
  cpg_matrix_m1 <- t(as.matrix(df_fs_method1[, featureName_CpGs]))
  fit_m1 <- lmFit(cpg_matrix_m1, design_m1)


}
if(METHOD_FEATURE_FLAG == 1){
  # for here, we have three labels. The contrasts to compare groups will be: 
  contrast_matrix_m1 <- makeContrasts(
  MCI_vs_CN = MCI - CN,
  Dementia_vs_CN = Dementia - CN,
  Dementia_vs_MCI = Dementia - MCI,
  levels = design_m1
  )
  fit2_m1 <- contrasts.fit(fit_m1, contrast_matrix_m1)
  fit2_m1 <- eBayes(fit2_m1)
  
  topTable(fit2_m1, coef = "MCI_vs_CN") 
  topTable(fit2_m1, coef = "Dementia_vs_CN")  
  topTable(fit2_m1, coef = "Dementia_vs_MCI") 
  summary_results_m1 <- decideTests(fit2_m1,method = "nestedF", adjust.method = "none", p.value = 0.05)
  table(summary_results_m1)

  
}
if(METHOD_FEATURE_FLAG == 1){

  significant_dmp_filter_m1 <- summary_results_m1 != 0 
  significant_cpgs_m1_DMP <- unique(rownames(summary_results_m1)[
    apply(significant_dmp_filter_m1, 1, any)])
  print(paste("The significant CpGs after DMP are:",
             paste(significant_cpgs_m1_DMP, collapse = ", ")))
  print(paste("Length of CpGs after DMP:", 
              length(significant_cpgs_m1_DMP)))
  
  pickedFeatureName_m1_afterDMP<-c(phenotic_features_m1,significant_cpgs_m1_DMP)
  df_fs_method1<-df_fs_method1[,pickedFeatureName_m1_afterDMP]

  dim(df_fs_method1)
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 1){
  
  library(recipes)
  df_picked <- df_fs_method1
 
  rec <- recipe(DX ~ ., data = df_picked) %>%
    step_zv(all_predictors()) %>%  
   # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked)

  processed_data_m1 <- bake(rec_prep, new_data = df_picked)
  dim(processed_data_m1)
  processed_data_m1_df<-as.data.frame(processed_data_m1)
  rownames(processed_data_m1_df)<-rownames(df_picked)
}
if(METHOD_FEATURE_FLAG == 1){
  AfterProcess_FeatureName_m1<-colnames(processed_data_m1)
  head(AfterProcess_FeatureName_m1)
  tail(AfterProcess_FeatureName_m1)
}
if(METHOD_FEATURE_FLAG == 1){
  head(processed_data_m1[,1:5])
}
if(METHOD_FEATURE_FLAG == 1){
  lastColumn_NUM<-dim(processed_data_m1)[2]
  last5Column_NUM<-lastColumn_NUM-5
  head(processed_data_m1[,last5Column_NUM :lastColumn_NUM])
}

(2) Method Two - PCA

if(METHOD_FEATURE_FLAG == 2){
  bloodPropFeatureName<-c("RID.a","prop.B","prop.NK",
                          "prop.CD4T","prop.CD8T","prop.Mono",
                          "prop.Neutro","prop.Eosino")
  pickedFeatureName_m2<-c("DX","age.now",
                          "PTGENDER",bloodPropFeatureName,
                          "ABETA","TAU","PTAU",featureName_CpGs)
  df_fs_method2<-clean_merged_df[,pickedFeatureName_m2]
}
Use “Recipe” preprocess the Data
if(METHOD_FEATURE_FLAG == 2){
  library(recipes)

  rec <- recipe(DX ~ ., data = df_fs_method2) %>%
    step_zv(all_predictors()) %>%
    step_normalize(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_fs_method2)

  processed_data_m2 <- bake(rec_prep, new_data = df_fs_method2)
  dim(processed_data_m2)
}
PCA
if(METHOD_FEATURE_FLAG == 2){
  
  X_df_m2<-subset(processed_data_m2,select = -DX)
  Y_df_m2<-processed_data_m2$DX

  pca_result <- prcomp(X_df_m2, center = TRUE, scale. = TRUE)

  summary(pca_result)

  screeplot(pca_result,type="lines")

}
if(METHOD_FEATURE_FLAG == 2){
  
  PCA_component_threshold<-0.7
}
if(METHOD_FEATURE_FLAG == 2){
  library(caret)
  preproc<-preProcess(X_df_m2,method="pca",
                      thresh = PCA_component_threshold)
  X_df_m2_transformed_PCA <- predict(preproc,X_df_m2)
  data_processed_PCA<-data.frame(X_df_m2_transformed_PCA,Y_df_m2)
  colnames(data_processed_PCA)[
    which(colnames(data_processed_PCA)=="Y_df_m2")]<-"DX"
  head(data_processed_PCA)
}
if(METHOD_FEATURE_FLAG == 2){
  processed_data_m2<-data_processed_PCA
  AfterProcess_FeatureName_m2<-colnames(data_processed_PCA)
}

(3) Method Three - Covert to Binary Class

if(METHOD_FEATURE_FLAG == 3){
  
  df_fs_method3<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 3){
  phenotic_features_m3<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m3<-c(phenotic_features_m3,featureName_CpGs)
  df_picked_m3<-df_fs_method3[,pickedFeatureName_m3]

  df_picked_m3$DX<-as.factor(df_picked_m3$DX)
  df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)
  head(df_picked_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
  dim(df_picked_m3)
}
## [1]  648 5006
Change to Two Class Classification
if(METHOD_FEATURE_FLAG == 3){
  df_picked_m3<-df_picked_m3 %>% mutate(
    DX = ifelse(DX == "CN", "CN",ifelse(DX 
    %in% c("MCI","Dementia"),"CI",NA)))
  
  df_picked_m3$DX<-as.factor(df_picked_m3$DX)
  df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)

  head(df_picked_m3[1:10],n=3)

}
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 3){
  pheno_data_m3 <- df_picked_m3[,phenotic_features_m3] 
  head(pheno_data_m3[,1:5],n=3)

  design_m3 <- model.matrix(~0 + .,data=pheno_data_m3)

  colnames(design_m3)[colnames(design_m3) == "DXCN"] <- "CN"
  colnames(design_m3)[colnames(design_m3) == "DXCI"] <- "CI"

  head(design_m3)

  beta_values_m3 <- t(as.matrix(df_fs_method3[,featureName_CpGs]))

}

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 3, we focus on two groups, one contrast of interest.

if(METHOD_FEATURE_FLAG == 3){

  fit_m3 <- lmFit(beta_values_m3, design_m3)
  head(fit_m3$coefficients)


  contrast.matrix <- makeContrasts(CI - CN, levels = design_m3)
 
  fit2_m3 <- contrasts.fit(fit_m3, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m3 <- eBayes(fit2_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  decideTests(fit2_m3)
}
## TestResults matrix
##             Contrasts
##              CI - CN
##   cg08223187       0
##   cg15794987       0
##   cg04821830       0
##   cg24629711       0
##   cg17380855       0
## 4995 more rows ...
if(METHOD_FEATURE_FLAG == 3){
  dmp_results_m3_try1 <- decideTests(
    fit2_m3, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m3_try1)

}
## dmp_results_m3_try1
##    0 
## 5000
if(METHOD_FEATURE_FLAG == 3){
  # Identify DMPs, we will use this one:
  dmp_results_m3 <- decideTests(
    fit2_m3, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m3)
}
## dmp_results_m3
##   -1    0    1 
##  200 4619  181
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 3){

  significant_dmp_filter <- dmp_results_m3 != 0 
  significant_cpgs_m3_DMP <- rownames(dmp_results_m3)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m3_afterDMP<-c(phenotic_features_m3,significant_cpgs_m3_DMP)
  df_picked_m3<-df_picked_m3[,pickedFeatureName_m3_afterDMP]

  dim(df_picked_m3)
}
## [1] 648 387
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 3){
  full_results_m3 <- topTable(fit2_m3, number=Inf)
  full_results_m3 <- tibble::rownames_to_column(full_results_m3,"ID")
  head(full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  sorted_full_results_m3 <- full_results_m3[
    order(full_results_m3$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
  library(ggplot2)
  ggplot(full_results_m3,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 3){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m3 <- full_results_m3 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m3, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider increasing max.overlaps

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 3){
  ggplot(full_results_m3,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}

if(METHOD_FEATURE_FLAG == 3){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m3 <- full_results_m3 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m3, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider increasing max.overlaps

Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 3){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m3) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m3)

  processed_data_m3 <- bake(rec_prep, new_data = df_picked_m3)
  processed_data_m3_df <- as.data.frame(processed_data_m3)
  rownames(processed_data_m3_df) <- rownames(df_picked_m3)
  dim(processed_data_m3)
}
## [1] 648 314
if(METHOD_FEATURE_FLAG == 3){
  AfterProcess_FeatureName_m3<-colnames(processed_data_m3)
  head(AfterProcess_FeatureName_m3)
  tail(AfterProcess_FeatureName_m3)
}
## [1] "cg21243064" "cg27577781" "cg20685672" "cg03660162" "cg17042243" "DX"
if(METHOD_FEATURE_FLAG == 3){
  levels(df_picked_m3$DX)
}
## [1] "CI" "CN"
if(METHOD_FEATURE_FLAG == 3){
  head(processed_data_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
  lastColumn_NUM_m3<-dim(processed_data_m3)[2]
  last5Column_NUM_m3<-lastColumn_NUM_m3-5
  head(processed_data_m3[,last5Column_NUM_m3 :lastColumn_NUM_m3])
}
if(METHOD_FEATURE_FLAG == 3){
  levels(processed_data_m3$DX)
}
## [1] "CI" "CN"

(4) Method Four - CN vs AD

In this method, only CN and AD class will be considered.

if(METHOD_FEATURE_FLAG == 4){
  
  df_fs_method4<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 4){
  phenotic_features_m4<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m4<-c(phenotic_features_m4,featureName_CpGs)
  df_picked_m4<-df_fs_method4[,pickedFeatureName_m4]

  df_picked_m4$DX<-as.factor(df_picked_m4$DX)
  df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)
  head(df_picked_m4[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 4){
  dim(df_picked_m4)
}
Filter and Change to Classification with ‘CN vs AD (Dementia)’
if(METHOD_FEATURE_FLAG == 4){
  df_picked_m4<-df_picked_m4 %>%  filter(DX != "MCI") %>% droplevels()

  
  df_picked_m4$DX<-as.factor(df_picked_m4$DX)
  df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)

  head(df_picked_m4[1:10],n=3)

}
if(METHOD_FEATURE_FLAG == 4){
  print(dim(df_picked_m4))
  print(table(df_picked_m4$DX))
}
if(METHOD_FEATURE_FLAG == 4){
  df_fs_method4 <- df_fs_method4 %>%  filter(DX != "MCI") %>% droplevels()
  df_fs_method4$DX<-as.factor(df_fs_method4$DX)
  print(head(df_fs_method4))
  print(dim(df_fs_method4))
}
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 4){
  pheno_data_m4 <- df_picked_m4[,phenotic_features_m4] 
  print(head(pheno_data_m4[,1:5],n=3))

  design_m4 <- model.matrix(~0 + .,data=pheno_data_m4)

  colnames(design_m4)[colnames(design_m4) == "DXCN"] <- "CN"
  colnames(design_m4)[colnames(design_m4) == "DXDementia"] <- "Dementia"

  print(head(design_m4))

  beta_values_m4 <- t(as.matrix(df_fs_method4[,featureName_CpGs]))

}

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 4, we focus on two groups (CN and Demantia), one contrast of interest.

if(METHOD_FEATURE_FLAG == 4){

  fit_m4 <- lmFit(beta_values_m4, design_m4)
  head(fit_m4$coefficients)


  contrast.matrix <- makeContrasts(Dementia - CN, levels = design_m4)
 
  fit2_m4 <- contrasts.fit(fit_m4, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m4 <- eBayes(fit2_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  decideTests(fit2_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  dmp_results_m4_try1 <- decideTests(
    fit2_m4, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m4_try1)

}

The constraints is too tight, let’s smooth the constraint.

if(METHOD_FEATURE_FLAG == 4){
  # Identify DMPs, we will use this one:
  dmp_results_m4 <- decideTests(
    fit2_m4, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m4)
}
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 4){

  significant_dmp_filter <- dmp_results_m4 != 0 
  significant_cpgs_m4_DMP <- rownames(dmp_results_m4)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m4_afterDMP<-c(phenotic_features_m4,significant_cpgs_m4_DMP)
  df_picked_m4<-df_picked_m4[,pickedFeatureName_m4_afterDMP]

  dim(df_picked_m4)
}
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 4){
  full_results_m4 <- topTable(fit2_m4, number=Inf)
  full_results_m4 <- tibble::rownames_to_column(full_results_m4,"ID")
  head(full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  sorted_full_results_m4 <- full_results_m4[
    order(full_results_m4$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  library(ggplot2)
  ggplot(full_results_m4,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 4){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m4 <- full_results_m4 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m4, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 4){
  ggplot(full_results_m4,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 4){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m4 <- full_results_m4 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m4, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 4){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m4) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m4)

  processed_data_m4 <- bake(rec_prep, new_data = df_picked_m4)
  processed_data_m4_df <- as.data.frame(processed_data_m4)
  rownames(processed_data_m4_df) <- rownames(df_picked_m4)
  print(dim(processed_data_m4))
}
if(METHOD_FEATURE_FLAG == 4){
  AfterProcess_FeatureName_m4<-colnames(processed_data_m4)
  print(length(AfterProcess_FeatureName_m4))
  head(AfterProcess_FeatureName_m4)
  tail(AfterProcess_FeatureName_m4)
}
if(METHOD_FEATURE_FLAG == 4){
  levels(df_picked_m4$DX)
}
if(METHOD_FEATURE_FLAG == 4){
  lastColumn_NUM_m4<-dim(processed_data_m4)[2]
  last5Column_NUM_m4<-lastColumn_NUM_m4-5
  head(processed_data_m4[,last5Column_NUM_m4 :lastColumn_NUM_m4])
}
if(METHOD_FEATURE_FLAG == 4){
  print(levels(processed_data_m4$DX))
  print(dim(processed_data_m4))
}

(5) Method Five - CN vs MCI

In this method, only CN and AD class will be considered.

if(METHOD_FEATURE_FLAG == 5){
  
  df_fs_method5<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 5){
  phenotic_features_m5<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m5<-c(phenotic_features_m5,featureName_CpGs)
  df_picked_m5<-df_fs_method5[,pickedFeatureName_m5]

  df_picked_m5$DX<-as.factor(df_picked_m5$DX)
  df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)
  head(df_picked_m5[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 5){
  dim(df_picked_m5)
}
Filter and Change to Classification with ‘CN vs MCI’
if(METHOD_FEATURE_FLAG == 5){
  df_picked_m5<-df_picked_m5 %>%  filter(DX != "Dementia") %>% droplevels()

  
  df_picked_m5$DX<-as.factor(df_picked_m5$DX)
  df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)

  head(df_picked_m5[1:10],n=3)

}
if(METHOD_FEATURE_FLAG == 5){
  print(dim(df_picked_m5))
  print(table(df_picked_m5$DX))
}
if(METHOD_FEATURE_FLAG == 5){
  df_fs_method5 <- df_fs_method5 %>%  filter(DX != "Dementia") %>% droplevels()
  df_fs_method5$DX<-as.factor(df_fs_method5$DX)
  print(head(df_fs_method5))
  print(dim(df_fs_method5))
}
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 5){
  pheno_data_m5 <- df_picked_m5[,phenotic_features_m5] 
  print(head(pheno_data_m5[,1:5],n=3))

  design_m5 <- model.matrix(~0 + .,data=pheno_data_m5)

  colnames(design_m5)[colnames(design_m5) == "DXCN"] <- "CN"
  colnames(design_m5)[colnames(design_m5) == "DXMCI"] <- "MCI"

  print(head(design_m5))

  beta_values_m5 <- t(as.matrix(df_fs_method5[,featureName_CpGs]))

}

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 5, we focus on two groups (CN and MCI), one contrast of interest.

if(METHOD_FEATURE_FLAG == 5){

  fit_m5 <- lmFit(beta_values_m5, design_m5)
  head(fit_m5$coefficients)


  contrast.matrix <- makeContrasts(MCI - CN, levels = design_m5)
 
  fit2_m5 <- contrasts.fit(fit_m5, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m5 <- eBayes(fit2_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  decideTests(fit2_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  dmp_results_m5_try1 <- decideTests(
    fit2_m5, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m5_try1)

}

The constraints is too tight, let’s smooth the constraint.

if(METHOD_FEATURE_FLAG == 5){
  # Identify DMPs, we will use this one:
  dmp_results_m5 <- decideTests(
    fit2_m5, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m5)
}
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 5){

  significant_dmp_filter <- dmp_results_m5 != 0 
  significant_cpgs_m5_DMP <- rownames(dmp_results_m5)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m5_afterDMP<-c(phenotic_features_m5,significant_cpgs_m5_DMP)
  df_picked_m5<-df_picked_m5[,pickedFeatureName_m5_afterDMP]

  dim(df_picked_m5)
}
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 5){
  full_results_m5 <- topTable(fit2_m5, number=Inf)
  full_results_m5 <- tibble::rownames_to_column(full_results_m5,"ID")
  head(full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  sorted_full_results_m5 <- full_results_m5[
    order(full_results_m5$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  library(ggplot2)
  ggplot(full_results_m5,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 5){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m5 <- full_results_m5 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m5, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 5){
  ggplot(full_results_m5,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 5){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m5 <- full_results_m5 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m5, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 5){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m5) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m5)

  processed_data_m5 <- bake(rec_prep, new_data = df_picked_m5)
  processed_data_m5_df <- as.data.frame(processed_data_m5)
  rownames(processed_data_m5_df) <- rownames(df_picked_m5)
  print(dim(processed_data_m5))
}
if(METHOD_FEATURE_FLAG == 5){
  AfterProcess_FeatureName_m5<-colnames(processed_data_m5)
  print(length(AfterProcess_FeatureName_m5))
  head(AfterProcess_FeatureName_m5)
  tail(AfterProcess_FeatureName_m5)
}
if(METHOD_FEATURE_FLAG == 5){
  levels(df_picked_m5$DX)
}
if(METHOD_FEATURE_FLAG == 5){
  lastColumn_NUM_m5<-dim(processed_data_m5)[2]
  last5Column_NUM_m5<-lastColumn_NUM_m5-5
  head(processed_data_m5[,last5Column_NUM_m5 :lastColumn_NUM_m5])
}
if(METHOD_FEATURE_FLAG == 5){
  print(levels(processed_data_m5$DX))
  print(dim(processed_data_m5))
}

(5) Method Six - MCI vs AD (Dementia)

In this method, only CN and AD class will be considered.

if(METHOD_FEATURE_FLAG == 6){
  
  df_fs_method6<-clean_merged_df

}
Picking Features
if(METHOD_FEATURE_FLAG == 6){
  phenotic_features_m6<-c(
    "DX","age.now","PTGENDER","PC1","PC2","PC3")
  pickedFeatureName_m6<-c(phenotic_features_m6,featureName_CpGs)
  df_picked_m6<-df_fs_method6[,pickedFeatureName_m6]

  df_picked_m6$DX<-as.factor(df_picked_m6$DX)
  df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)
  head(df_picked_m6[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 6){
  dim(df_picked_m6)
}
Filter and Change to Classification with ‘MCI vs Dementia’
if(METHOD_FEATURE_FLAG == 6){
  df_picked_m6<-df_picked_m6 %>%  filter(DX != "CN") %>% droplevels()

  
  df_picked_m6$DX<-as.factor(df_picked_m6$DX)
  df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)

  head(df_picked_m6[1:10],n=3)

}
if(METHOD_FEATURE_FLAG == 6){
  print(dim(df_picked_m6))
  print(table(df_picked_m6$DX))
}
if(METHOD_FEATURE_FLAG == 6){
  df_fs_method6 <- df_fs_method6 %>%  filter(DX != "CN") %>% droplevels()
  df_fs_method6$DX<-as.factor(df_fs_method6$DX)
  print(head(df_fs_method6))
  print(dim(df_fs_method6))
}
Perform DMP - Use LIMMA
if(METHOD_FEATURE_FLAG == 6){
  pheno_data_m6 <- df_picked_m6[,phenotic_features_m6] 
  print(head(pheno_data_m6[,1:5],n=3))

  design_m6 <- model.matrix(~0 + .,data=pheno_data_m6)

  colnames(design_m6)[colnames(design_m6) == "DXDementia"] <- "Dementia"
  colnames(design_m6)[colnames(design_m6) == "DXMCI"] <- "MCI"

  print(head(design_m6))

  beta_values_m6 <- t(as.matrix(df_fs_method6[,featureName_CpGs]))

}

In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 6, we focus on two groups (MCI and Dementia), one contrast of interest.

if(METHOD_FEATURE_FLAG == 6){

  fit_m6 <- lmFit(beta_values_m6, design_m6)
  head(fit_m6$coefficients)


  contrast.matrix <- makeContrasts(MCI - Dementia, levels = design_m6)
 
  fit2_m6 <- contrasts.fit(fit_m6, contrast.matrix)

  # Apply the empirical Bayes’ step to get our differential expression statistics and p-values.

  fit2_m6 <- eBayes(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  decideTests(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  dmp_results_m6_try1 <- decideTests(
    fit2_m6, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
  table(dmp_results_m6_try1)

}

The constraints is too tight, let’s smooth the constraint.

if(METHOD_FEATURE_FLAG == 6){
  # Identify DMPs, we will use this one:
  dmp_results_m6 <- decideTests(
    fit2_m6, lfc = 0.01, adjust.method = "none", p.value = 0.1)

  table(dmp_results_m6)
}
Final used CpGs after DMP
if(METHOD_FEATURE_FLAG == 6){

  significant_dmp_filter <- dmp_results_m6 != 0 
  significant_cpgs_m6_DMP <- rownames(dmp_results_m6)[
    apply(significant_dmp_filter, 1, any)]

  pickedFeatureName_m6_afterDMP<-c(phenotic_features_m6,significant_cpgs_m6_DMP)
  df_picked_m6<-df_picked_m6[,pickedFeatureName_m6_afterDMP]

  dim(df_picked_m6)
}
Visualize with the results of DMP

The “Volcano Plot”is one way to visualize the results of a DE analysis.

X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).

Interpretation of logFC:

  • Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).

  • Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.

  • LogFC of 0: Indicates no difference in the measurement between the two groups.

Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).

Interpretation of B-value:

  • Higher B-value: Indicates stronger evidence for differential methylation.

  • Lower (or negative) B-value: Indicates weaker evidence for differential methylation.

  • B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.

A characteristic “volcano” shape should be seen. Let’s look at the results:

if(METHOD_FEATURE_FLAG == 6){
  full_results_m6 <- topTable(fit2_m6, number=Inf)
  full_results_m6 <- tibble::rownames_to_column(full_results_m6,"ID")
  head(full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  sorted_full_results_m6 <- full_results_m6[
    order(full_results_m6$logFC, decreasing = TRUE), ]
  head(sorted_full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  library(ggplot2)
  ggplot(full_results_m6,aes(x = logFC, y=B)) + geom_point()
}

Now, let’s visualize the plot with the cutoff

if(METHOD_FEATURE_FLAG == 6){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m6 <- full_results_m6 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m6, aes(x = logFC, 
                              y = B, col = Significant, label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}

Now, let’s change the y-axis to P value

if(METHOD_FEATURE_FLAG == 6){
  ggplot(full_results_m6,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 6){
  library(dplyr)
  library(ggrepel)
  p_cutoff <- 0.1
  fc_cutoff <- 0.01
  topN <- 20

  full_results_m6 <- full_results_m6 %>%
      mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
      mutate(Rank = rank(-abs(logFC)), 
             Label = ifelse(Rank <= topN, as.character(ID), ""))

  ggplot(full_results_m6, 
         aes(x = logFC, y = -log10(P.Value), 
             col = Significant, 
             label = Label)) +
    geom_point() +
    geom_text_repel(col = "black")
}
Use “Recipe” - - Process Data
if(METHOD_FEATURE_FLAG == 6){
  
  library(recipes)

 
  rec <- recipe(DX ~ ., data = df_picked_m6) %>%
    step_zv(all_predictors()) %>%
    # step_range(all_numeric(), -all_outcomes()) %>%
    step_dummy(all_nominal(), -all_outcomes())%>%
    step_corr(all_predictors(), threshold = 0.7)

  rec_prep <- prep(rec, df_picked_m6)

  processed_data_m6 <- bake(rec_prep, new_data = df_picked_m6)
  processed_data_m6_df <- as.data.frame(processed_data_m6)
  rownames(processed_data_m6_df) <- rownames(df_picked_m6)
  print(dim(processed_data_m6))
}
if(METHOD_FEATURE_FLAG == 6){
  AfterProcess_FeatureName_m6<-colnames(processed_data_m6)
  print(length(AfterProcess_FeatureName_m6))
  head(AfterProcess_FeatureName_m6)
  tail(AfterProcess_FeatureName_m6)
}
if(METHOD_FEATURE_FLAG == 6){
  levels(df_picked_m6$DX)
}
if(METHOD_FEATURE_FLAG == 6){
  lastColumn_NUM_m6<-dim(processed_data_m6)[2]
  last5Column_NUM_m6<-lastColumn_NUM_m6-5
  head(processed_data_m6[,last5Column_NUM_m6 :lastColumn_NUM_m6])
}
if(METHOD_FEATURE_FLAG == 6){
  print(levels(processed_data_m6$DX))
  print(dim(processed_data_m6))
}

1.7 INPUT - Model Train

name for “processed_data” could be :

  1. “processed_data_m1”, which uses method one to process the data

  2. “processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.

  3. “processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.

    Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names.

  4. “processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.

  5. “processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.

  6. “processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.

name for “AfterProcess_FeatureName” (include “DX” label) could be :

  1. “AfterProcess_FeatureName_m1”, which is column name of processed dataframe with method one.
  2. “AfterProcess_FeatureName_m2”, which is column name of principle component method.
  3. “AfterProcess_FeatureName_m3”, which is column name of processed dataframe with method three. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
  4. “AfterProcess_FeatureName_m4”, which is column name of processed dataframe with method four. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
  5. “AfterProcess_FeatureName_m5”, which is column name of processed dataframe with method five. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
  6. “AfterProcess_FeatureName_m6”, which is column name of processed dataframe with method six. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
if(METHOD_FEATURE_FLAG==1){
  
  processed_dataFrame<-processed_data_m1_df
  processed_data<-processed_data_m1

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m1

  
}


if(METHOD_FEATURE_FLAG==2){
  
  processed_dataFrame<-processed_data_m2_df
  processed_data<-processed_data_m2

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m2

  
}

if(METHOD_FEATURE_FLAG==3){
  
  processed_dataFrame<-processed_data_m3_df
  processed_data<-processed_data_m3

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m3

  
}

if(METHOD_FEATURE_FLAG==4){
  
  processed_dataFrame<-processed_data_m4_df
  processed_data<-processed_data_m4

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m4

  
}

if(METHOD_FEATURE_FLAG==5){
  
  processed_dataFrame<-processed_data_m5_df
  processed_data<-processed_data_m5

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m5

  
}

if(METHOD_FEATURE_FLAG==6){
  
  processed_dataFrame<-processed_data_m6_df
  processed_data<-processed_data_m6

  AfterProcess_FeatureName<-AfterProcess_FeatureName_m6

  
}
print(head(processed_dataFrame))
##                     age.now          PC1        PC2          PC3 cg18993517 cg13573375 cg17002338 cg02621446 cg24470466 cg08896901 cg23916408 cg12146221 cg25174111 cg05234269 cg14293999 cg14307563
## 200223270003_R02C01    82.4 -0.214185447 0.01470293 -0.014043316  0.2091538  0.8670419  0.9286251  0.8731313  0.7725300  0.3581911  0.1942275  0.2049284  0.8526503 0.93848584  0.2836710  0.1855966
## 200223270003_R03C01    78.6 -0.172761185 0.05745834  0.005055871  0.2665896  0.1733934  0.2684163  0.8095534  0.9041432  0.2467071  0.9154993  0.1814927  0.8573844 0.57461229  0.9172023  0.8916957
## 200223270003_R06C01    80.4 -0.003667305 0.08372861  0.029143653  0.2574003  0.8888246  0.2811103  0.7511582  0.1206738  0.9225209  0.8886255  0.8619250  0.2567745 0.02467208  0.9168166  0.8750052
##                     cg21209485 cg11331837 cg11187460 cg13653328 cg09451339 cg06961873 cg23159970 cg10788927 cg05392160 cg04540199 cg01608425 cg18285382 cg24851651 cg22071943 cg24643105 cg03549208
## 200223270003_R02C01  0.8865053 0.03692842 0.03672179  0.9245434  0.2243746  0.5335591 0.61817246  0.8973154  0.9328933  0.8165865  0.9030410  0.3202927 0.03674702  0.8705217  0.5303418  0.9014487
## 200223270003_R03C01  0.8714878 0.57150125 0.92516409  0.5122938  0.2340702  0.5472606 0.57492600  0.2021398  0.2576881  0.7964195  0.9264388  0.2930577 0.05358297  0.2442648  0.5042688  0.8381784
## 200223270003_R06C01  0.2292550 0.03182862 0.03109553  0.9362798  0.8921284  0.9415177 0.03288909  0.2053075  0.8920726  0.4698047  0.8887753  0.8923595 0.05968923  0.2644581  0.9383050  0.9097817
##                     cg17653352 cg14252149 cg04831745 cg13372276 cg07640670 cg25879395 cg05593887 cg18339359 cg08283200 cg21783012 cg03221390 cg05321907 cg10985055 cg15912814 cg03327352 cg20300784
## 200223270003_R02C01  0.9269778 0.02455407 0.61984995 0.04888111 0.58296513 0.88130864  0.5939220  0.8824858  0.8831085  0.9142369  0.5859063  0.2880477  0.8518169  0.8342997  0.8851712 0.86585964
## 200223270003_R03C01  0.9086951 0.02450779 0.71214149 0.62396373 0.55225610 0.02603438  0.5766550  0.9040272  0.2652269  0.6694884  0.9180706  0.1782629  0.8631895  0.8673032  0.8786878 0.86609999
## 200223270003_R06C01  0.9341775 0.02382413 0.06871768 0.59693465 0.04058533 0.91060615  0.9148338  0.8552121  0.8935829  0.9070112  0.6399867  0.8427929  0.5456633  0.8455862  0.3042310 0.03091187
##                     cg16733676 cg05130642 cg04462915 cg25649515 cg22169467 cg00553601 cg03749159  cg03088219 cg13038195 cg17738613 cg02772171 cg12240569 cg06012903 cg14192979 cg17906851 cg01933473
## 200223270003_R02C01  0.9057228  0.8575504 0.03224861  0.9279829  0.3095010 0.05601299  0.9355921 0.844002862 0.45882213  0.6879612  0.9182018 0.82772064  0.7964595 0.06336040  0.9488392  0.2589014
## 200223270003_R03C01  0.8904541  0.8644077 0.50740695  0.9235753  0.2978585 0.58957701  0.9153921 0.007435243 0.02740132  0.6582258  0.5660559 0.02690547  0.1933431 0.06019651  0.9529718  0.6726133
## 200223270003_R06C01  0.1698111  0.3661324 0.02700644  0.5895839  0.8955853 0.62426500  0.9255807 0.120155222 0.46284376  0.1022257  0.8995479 0.46030640  0.1960773 0.52114282  0.6462151  0.2642560
##                     cg16089727 cg18857647 cg00767423 cg08041188 cg03737947 cg26679884 cg22305850 cg26846609 cg04867412 cg02823329 cg00295418 cg21139150 cg25059696 cg21388339 cg13815695 cg04888234
## 200223270003_R02C01 0.86748697  0.8582332  0.9298253  0.7752456 0.91824910  0.6793815 0.03361934 0.48860949 0.04304823  0.9462397 0.44954665 0.01853264  0.9017504  0.2756268  0.9267057  0.8379655
## 200223270003_R03C01 0.54996692  0.8394132  0.2651854  0.3201255 0.92067153  0.1848705 0.57522232 0.04878986 0.87967997  0.6464005 0.48471295 0.43223243  0.3047156  0.2102269  0.6859729  0.4376314
## 200223270003_R06C01 0.05876736  0.2647491  0.8667808  0.7900939 0.03638091  0.1701734 0.58548744 0.48026945 0.44971146  0.9633930 0.02004532 0.43772680  0.3051179  0.7649181  0.6509046  0.8039047
##                     cg08096656 cg14780448 cg17723206 cg03600007 cg11438323 cg00322820 cg15535896 cg18698799 cg12501287 cg01462799 cg10738648 cg23836570 cg09785377 cg16536985 cg02122327 cg12784167
## 200223270003_R02C01  0.9362594  0.9119141 0.92881042  0.5658487  0.4863471  0.4869764  0.3382952 0.70099633  0.4654925  0.8284427 0.44931577 0.58688450  0.9162088  0.5789643 0.38940091 0.81503498
## 200223270003_R03C01  0.9314878  0.6702102 0.48556255  0.6018832  0.8984559  0.4858988  0.9253926 0.05812989  0.5126917  0.4038824 0.49894016 0.54259383  0.9226292  0.5418687 0.37769608 0.02811410
## 200223270003_R06C01  0.4943033  0.6207355 0.01765023  0.8611166  0.8722772  0.4754313  0.3320191 0.06957486  0.9189144  0.4676821 0.05552024 0.03267304  0.6405193  0.8392044 0.04017909 0.03073269
##                     cg15633912 cg02495179 cg19471911 cg20823859 cg02078724 cg04242342 cg20981163 cg00345083 cg09247979 cg02246922 cg20566384 cg25436480 cg06483046 cg02550738 cg01008088 cg20078646
## 200223270003_R02C01  0.1605530  0.6813307  0.6334393  0.9030711  0.3096774  0.8206769  0.8990628 0.47960968  0.5070956  0.7301201 0.06000262  0.8425160 0.04383925  0.6201457  0.8424817 0.06198170
## 200223270003_R03C01  0.9333421  0.7373055  0.8437175  0.6062985  0.2896133  0.8167892  0.9264076 0.50833875  0.5706177  0.9447019 0.62206350  0.4994032 0.50720277  0.9011727  0.2417656 0.89537412
## 200223270003_R06C01  0.8737362  0.5588114  0.6127952  0.8917348  0.2805612  0.8040357  0.4874651 0.03929249  0.5090215  0.7202230 0.89269664  0.3494312 0.89604910  0.9085849  0.2618620 0.08725521
##                     cg11268585 cg06864789 cg04316537 cg27224751 cg00939409 cg26983017 cg15184869 cg06403901 cg13387643 cg04768387 cg17268094 cg01128042 cg14507637 cg16202259 cg19799454 cg08198851
## 200223270003_R02C01  0.2521544 0.05369415  0.8074830 0.44503947  0.2652180 0.89868232  0.8622328 0.92790690  0.4229959  0.3131047  0.5774753  0.9113420  0.9051258  0.9548726  0.9178930  0.6578905
## 200223270003_R03C01  0.8535791 0.46053125  0.8453340 0.03214912  0.8882671 0.03145466  0.8996252 0.04783341  0.4200273  0.9465814  0.9003262  0.5328806  0.9009460  0.3713483  0.9106247  0.6578186
## 200223270003_R06C01  0.9121931 0.87513655  0.4351695 0.83123722  0.8842646 0.84677625  0.8688117 0.05253626  0.4161488  0.9098563  0.8789368  0.5222757  0.9013686  0.4852461  0.9066551  0.1272153
##                     cg05891136 cg04412904 cg11227702 cg18150287 cg12333628 cg14168080 cg27160885 cg05161773 cg25306893 cg14181112 cg02932958 cg00962106 cg08745107 cg01662749 cg11286989 cg15775217
## 200223270003_R02C01  0.7797403 0.05088595 0.86486075  0.7685695  0.9227884  0.4190123  0.2231606  0.4120912  0.6265392  0.7043545  0.7901008  0.9124898 0.02921338  0.3506201  0.7590008  0.5707441
## 200223270003_R03C01  0.3310206 0.07717659 0.49184121  0.7519166  0.9092861  0.4420256  0.8263885  0.4154907  0.8330282  0.1615405  0.4210489  0.5375751 0.78542320  0.2510946  0.8533989  0.9168327
## 200223270003_R06C01  0.7965298 0.08253743 0.02543724  0.2501173  0.5084647  0.4355521  0.2121179  0.8526849  0.6175380  0.3424621  0.3825995  0.5040948 0.02709928  0.8061480  0.7313884  0.6042521
##                     cg24139837 cg04645024 cg01280698 cg11314779 cg21697769 cg13739190 cg12543766 cg09120722 cg27070288 cg16715186 cg00696044 cg00084271 cg24883219 cg02627240 cg20673830 cg08788093
## 200223270003_R02C01 0.07404605  0.7366541  0.8985067  0.0242134  0.8946108  0.8510103 0.51028134  0.5878977  0.7721937  0.2742789 0.55608424  0.8103611  0.6430473 0.66706843  0.2422052 0.03911678
## 200223270003_R03C01 0.04183445  0.8454827  0.8846201  0.8966100  0.2822953  0.8358482 0.88741539  0.8287506  0.8584529  0.7946153 0.07552381  0.7877006  0.6822115 0.57129408  0.6881735 0.60934160
## 200223270003_R06C01 0.05657120  0.0871902  0.8847132  0.8908661  0.8698740  0.8419471 0.02818501  0.8793344  0.8634018  0.8124316 0.79270858  0.7706165  0.5296903 0.05309659  0.2134634 0.88380243
##                     cg07951602 cg02389264 cg15586958 cg01153376 cg15600437 cg23352245 cg22542451 cg10701746 cg17386240 cg11540596 cg22666875 cg04156077 cg23177161 cg00648024 cg10681981 cg02668233
## 200223270003_R02C01  0.8842206  0.7813213  0.9058263  0.4872148  0.4885353  0.9377232  0.5884356  0.4795503  0.7473400  0.9238951  0.8177182  0.7321883  0.4151698 0.51410972  0.7035090  0.4708431
## 200223270003_R03C01  0.8766842  0.7900942  0.8957526  0.9639670  0.4894487  0.9375774  0.8337068  0.4868342  0.7144809  0.8926595  0.8291957  0.6865805  0.4586576 0.40202875  0.7382662  0.8841930
## 200223270003_R06C01  0.8918089  0.7789974  0.9121763  0.2242410  0.8551374  0.5932742  0.8125084  0.4927257  0.8074824  0.8820252  0.3694180  0.8501188  0.8287312 0.05579011  0.6971989  0.4575646
##                     cg11706829 cg02356645 cg00146240 cg24307368 cg04497611 cg24697433 cg18949721 cg07480955 cg12556569 cg22931151 cg14532717 cg13226272 cg08584917 cg06394820 cg17131279 cg07138269
## 200223270003_R02C01  0.8897234  0.5105903  0.6336151 0.64323677  0.9086359  0.9243095  0.2334245  0.3874638 0.06218231  0.9311023  0.5732280 0.02637249  0.5663205  0.8513195  0.1900637  0.5002290
## 200223270003_R03C01  0.5444785  0.5833923  0.8957183 0.34980461  0.8818513  0.6808390  0.2437792  0.3916889 0.03924599  0.9356702  0.1107638 0.54100016  0.9019732  0.8695521  0.7048637  0.9426707
## 200223270003_R06C01  0.5669449  0.5701428  0.1433218 0.02720398  0.5853116  0.6384606  0.2523095  0.4043390 0.48636893  0.9328614  0.6273416 0.44370701  0.9187789  0.4415020  0.1492861  0.5057781
##                     cg26081710 cg25758034 cg22112152 cg19301366 cg00819121 cg10091792 cg21507367 cg16779438 cg14710850 cg06118351 cg11019791 cg01910713 cg22535849 cg21757617 cg08857872 cg20678988
## 200223270003_R02C01  0.8751040  0.6114028  0.8476101  0.8831393  0.9207001  0.8670733  0.9268560  0.8826150  0.8048592  0.3633940  0.8112324  0.8573169  0.8847704 0.03652647  0.3395280  0.8438718
## 200223270003_R03C01  0.9198212  0.6649219  0.8014136  0.8072679  0.9281472  0.5864221  0.9290102  0.5466924  0.8090950  0.4714860  0.7831231  0.8538850  0.8609966 0.44299089  0.8181845  0.8548886
## 200223270003_R06C01  0.8801892  0.2393844  0.7897897  0.8796022  0.9327211  0.6087997  0.9039559  0.8629492  0.8285902  0.8655962  0.4353250  0.8110366  0.8808022 0.44725379  0.2970779  0.7786685
##                     cg16431720 cg02887598 cg16858433 cg12702014 cg01921484 cg00415024 cg16338321 cg12776173 cg18029737 cg02643260 cg25712921 cg03084184 cg04124201 cg01549082 cg26948066 cg09015880
## 200223270003_R02C01  0.7356099 0.04020908  0.9184356  0.7704049  0.9098550  0.4299553  0.5350242  0.1038804  0.9100454  0.8580487  0.2829848  0.8162981  0.8686421  0.2924138  0.4685225  0.5101716
## 200223270003_R03C01  0.8692449 0.67073881  0.9194211  0.7848681  0.9093137  0.3999122  0.8294062  0.8730635  0.9016634  0.8288883  0.6220919  0.7877128  0.3308589  0.7065693  0.5026045  0.8402106
## 200223270003_R06C01  0.8773137 0.73408417  0.9271632  0.8065993  0.9204487  0.7465084  0.4918708  0.7009491  0.7376586  0.8664623  0.6384003  0.4546397  0.3241613  0.2895440  0.9101976  0.8472063
##                     cg11133939 cg15700429 cg25277809 cg12421087 cg24634455 cg03359067 cg02225060 cg25169289 cg00512739 cg04109990 cg13368637 cg12279734 cg23066280 cg06880438 cg10666341 cg10240127
## 200223270003_R02C01  0.1282694  0.7879010  0.1632342  0.5647607  0.7796391  0.7998055  0.6828159  0.1100884  0.9337648  0.9014696  0.5597507  0.6435368 0.07247841  0.8285145  0.9046648  0.9250553
## 200223270003_R03C01  0.5920898  0.9114530  0.4913711  0.5399655  0.5188241  0.8628564  0.8265195  0.7667174  0.8863895  0.6476604  0.9100088  0.1494651 0.57174588  0.7988881  0.6731062  0.9403255
## 200223270003_R06C01  0.5127706  0.8838233  0.5952124  0.5400348  0.5325725  0.8144536  0.5209552  0.2264993  0.9242748  0.6692040  0.8739205  0.8760759 0.80814756  0.7839538  0.6443180  0.9056974
##                     cg23432430 cg16652920 cg12228670 cg19503462 cg07028768 cg26853071 cg06277607 cg11787167 cg17296678 cg06960717 cg00086247 cg09584650 cg27272246 cg10738049 cg12689021 cg21986118
## 200223270003_R02C01  0.9482702  0.9436000  0.8632174  0.7951675  0.4496851  0.4233820 0.10744587 0.03853894  0.8262635  0.7030978  0.1761275 0.08230254  0.8615873  0.5441211  0.7706828  0.6658175
## 200223270003_R03C01  0.9455418  0.9431222  0.8496212  0.4537684  0.8536078  0.7451354 0.09353494 0.04673831  0.5653917  0.7653402  0.2045043 0.09661586  0.8705287  0.5232715  0.7449475  0.6571296
## 200223270003_R06C01  0.9418716  0.9457161  0.8738949  0.6997359  0.8356936  0.4228079 0.09504696 0.32564508  0.5272971  0.7206218  0.6901217 0.52399749  0.8103777  0.4875473  0.7872237  0.7034445
##                     cg20208879 cg22741595 cg05850457 cg04664583 cg09216282 cg03982462 cg05064044 cg06715136 cg20803293 cg15501526 cg06833284 cg16571124 cg07158503 cg06371647 cg17671604 cg14175932
## 200223270003_R02C01 0.66986658  0.6525533  0.8183013  0.5572814  0.9349248  0.8562777  0.5672851  0.3400192 0.54933918  0.6362531  0.9125144  0.9282854  0.5777146  0.8336894  0.3134752  0.5746953
## 200223270003_R03C01 0.02423079  0.1730013  0.8313023  0.5881190  0.9244259  0.6023731  0.5358875  0.9259109 0.07935747  0.6319253  0.9003482  0.9206431  0.6203543  0.8198684  0.6325735  0.8779027
## 200223270003_R06C01 0.61769424  0.1550739  0.8161364  0.9352717  0.9263996  0.8778458  0.5273964  0.9079807 0.42191244  0.7435100  0.6097933  0.9276842  0.6236025  0.8069537  0.7054536  0.7288239
##                     cg03979311 cg15730644 cg18819889 cg25208881 cg08861434 cg04718469 cg17002719 cg17429539 cg08554146 cg00322003 cg14170504 cg07504457 cg18526121 cg11247378 cg05876883 cg23517115
## 200223270003_R02C01 0.86644909  0.4803181  0.9156157  0.1851956  0.8768306  0.8687522 0.04939181  0.7860900  0.8982080  0.1759911 0.54915621  0.7116230  0.4519781  0.1591185  0.9039064  0.2151144
## 200223270003_R03C01 0.06199853  0.4353906  0.9004455  0.9092286  0.4352647  0.7256813 0.40466475  0.7100923  0.8963074  0.5702070 0.02236650  0.6854539  0.4762313  0.7874849  0.9223308  0.9131440
## 200223270003_R06C01 0.72615553  0.8763048  0.9054439  0.9265502  0.8698813  0.8521881 0.51428089  0.7660838  0.8213878  0.3077122 0.02988245  0.7205633  0.4833367  0.4807942  0.4697980  0.8328364
##                     cg13405878 cg03672288 cg18816397 cg14687298 cg14627380 cg10864200 cg00154902 cg15098922 cg15985500 cg26901661 cg10039445 cg00004073 cg07634717 cg27452255 cg06697310 cg02631626
## 200223270003_R02C01  0.4549662  0.9235592  0.5472925 0.04206702  0.9455369  0.7380052  0.5137741  0.9286092  0.8555262  0.8951971  0.8833873 0.02928535  0.7483382  0.9001010  0.8454609  0.6280766
## 200223270003_R03C01  0.7858042  0.6718625  0.4940355 0.14813581  0.9258964  0.7421384  0.8540746  0.9027517  0.8312198  0.8754981  0.8954055 0.02787198  0.8254434  0.6593379  0.8653044  0.1951736
## 200223270003_R06C01  0.7583938  0.9007629  0.5337018 0.24260002  0.5789898  0.5945457  0.8188126  0.8525611  0.8492103  0.9021064  0.8832807 0.64576463  0.8181246  0.9012217  0.2405168  0.2699849
##                     cg17129965 cg06231502 cg09727210 cg18918831 cg21243064 cg27577781 cg20685672 cg03660162 cg17042243 DX
## 200223270003_R02C01  0.8972140  0.7784451  0.4240111  0.4891660  0.5191606  0.8143535  0.6712101  0.8691767  0.2502905 CI
## 200223270003_R03C01  0.8806673  0.7964278  0.8812928  0.5333801  0.9167649  0.8113185  0.7932091  0.5160770  0.2933475 CN
## 200223270003_R06C01  0.8857237  0.7706160  0.8493743  0.6406575  0.4862205  0.8144274  0.6613646  0.9026304  0.2725457 CN
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
print(dim(processed_dataFrame))
## [1] 648 314
print(length(AfterProcess_FeatureName))
## [1] 314
print(head(processed_data))
## # A tibble: 6 × 314
##   age.now      PC1        PC2      PC3 cg18993517 cg13573375 cg17002338 cg02621446 cg24470466 cg08896901 cg23916408 cg12146221 cg25174111 cg05234269 cg14293999 cg14307563 cg21209485 cg11331837
##     <dbl>    <dbl>      <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1    82.4 -0.214    0.0147    -0.0140      0.209       0.867      0.929      0.873      0.773      0.358      0.194      0.205      0.853     0.938       0.284      0.186      0.887     0.0369
## 2    78.6 -0.173    0.0575     0.00506     0.267       0.173      0.268      0.810      0.904      0.247      0.915      0.181      0.857     0.575       0.917      0.892      0.871     0.572 
## 3    80.4 -0.00367  0.0837     0.0291      0.257       0.889      0.281      0.751      0.121      0.923      0.889      0.862      0.257     0.0247      0.917      0.875      0.229     0.0318
## 4    78.2 -0.187   -0.0112    -0.0323      0.0945      0.131      0.271      0.877      0.927      0.342      0.887      0.124      0.190     0.565       0.919      0.898      0.235     0.0383
## 5    62.9  0.0268   0.0000165  0.0529      0.940       0.161      0.880      0.205      0.190      0.924      0.222      0.202      0.267     0.948       0.197      0.876      0.888     0.930 
## 6    80.7 -0.0379   0.0157    -0.00869     0.950       0.851      0.931      0.796      0.207      0.264      0.152      0.138      0.205     0.563       0.903      0.917      0.229     0.540 
## # ℹ 296 more variables: cg11187460 <dbl>, cg13653328 <dbl>, cg09451339 <dbl>, cg06961873 <dbl>, cg23159970 <dbl>, cg10788927 <dbl>, cg05392160 <dbl>, cg04540199 <dbl>, cg01608425 <dbl>,
## #   cg18285382 <dbl>, cg24851651 <dbl>, cg22071943 <dbl>, cg24643105 <dbl>, cg03549208 <dbl>, cg17653352 <dbl>, cg14252149 <dbl>, cg04831745 <dbl>, cg13372276 <dbl>, cg07640670 <dbl>,
## #   cg25879395 <dbl>, cg05593887 <dbl>, cg18339359 <dbl>, cg08283200 <dbl>, cg21783012 <dbl>, cg03221390 <dbl>, cg05321907 <dbl>, cg10985055 <dbl>, cg15912814 <dbl>, cg03327352 <dbl>,
## #   cg20300784 <dbl>, cg16733676 <dbl>, cg05130642 <dbl>, cg04462915 <dbl>, cg25649515 <dbl>, cg22169467 <dbl>, cg00553601 <dbl>, cg03749159 <dbl>, cg03088219 <dbl>, cg13038195 <dbl>,
## #   cg17738613 <dbl>, cg02772171 <dbl>, cg12240569 <dbl>, cg06012903 <dbl>, cg14192979 <dbl>, cg17906851 <dbl>, cg01933473 <dbl>, cg16089727 <dbl>, cg18857647 <dbl>, cg00767423 <dbl>,
## #   cg08041188 <dbl>, cg03737947 <dbl>, cg26679884 <dbl>, cg22305850 <dbl>, cg26846609 <dbl>, cg04867412 <dbl>, cg02823329 <dbl>, cg00295418 <dbl>, cg21139150 <dbl>, cg25059696 <dbl>,
## #   cg21388339 <dbl>, cg13815695 <dbl>, cg04888234 <dbl>, cg08096656 <dbl>, cg14780448 <dbl>, cg17723206 <dbl>, cg03600007 <dbl>, cg11438323 <dbl>, cg00322820 <dbl>, cg15535896 <dbl>, …
print(dim(processed_data))
## [1] 648 314
print(AfterProcess_FeatureName)
##   [1] "age.now"    "PC1"        "PC2"        "PC3"        "cg18993517" "cg13573375" "cg17002338" "cg02621446" "cg24470466" "cg08896901" "cg23916408" "cg12146221" "cg25174111" "cg05234269" "cg14293999"
##  [16] "cg14307563" "cg21209485" "cg11331837" "cg11187460" "cg13653328" "cg09451339" "cg06961873" "cg23159970" "cg10788927" "cg05392160" "cg04540199" "cg01608425" "cg18285382" "cg24851651" "cg22071943"
##  [31] "cg24643105" "cg03549208" "cg17653352" "cg14252149" "cg04831745" "cg13372276" "cg07640670" "cg25879395" "cg05593887" "cg18339359" "cg08283200" "cg21783012" "cg03221390" "cg05321907" "cg10985055"
##  [46] "cg15912814" "cg03327352" "cg20300784" "cg16733676" "cg05130642" "cg04462915" "cg25649515" "cg22169467" "cg00553601" "cg03749159" "cg03088219" "cg13038195" "cg17738613" "cg02772171" "cg12240569"
##  [61] "cg06012903" "cg14192979" "cg17906851" "cg01933473" "cg16089727" "cg18857647" "cg00767423" "cg08041188" "cg03737947" "cg26679884" "cg22305850" "cg26846609" "cg04867412" "cg02823329" "cg00295418"
##  [76] "cg21139150" "cg25059696" "cg21388339" "cg13815695" "cg04888234" "cg08096656" "cg14780448" "cg17723206" "cg03600007" "cg11438323" "cg00322820" "cg15535896" "cg18698799" "cg12501287" "cg01462799"
##  [91] "cg10738648" "cg23836570" "cg09785377" "cg16536985" "cg02122327" "cg12784167" "cg15633912" "cg02495179" "cg19471911" "cg20823859" "cg02078724" "cg04242342" "cg20981163" "cg00345083" "cg09247979"
## [106] "cg02246922" "cg20566384" "cg25436480" "cg06483046" "cg02550738" "cg01008088" "cg20078646" "cg11268585" "cg06864789" "cg04316537" "cg27224751" "cg00939409" "cg26983017" "cg15184869" "cg06403901"
## [121] "cg13387643" "cg04768387" "cg17268094" "cg01128042" "cg14507637" "cg16202259" "cg19799454" "cg08198851" "cg05891136" "cg04412904" "cg11227702" "cg18150287" "cg12333628" "cg14168080" "cg27160885"
## [136] "cg05161773" "cg25306893" "cg14181112" "cg02932958" "cg00962106" "cg08745107" "cg01662749" "cg11286989" "cg15775217" "cg24139837" "cg04645024" "cg01280698" "cg11314779" "cg21697769" "cg13739190"
## [151] "cg12543766" "cg09120722" "cg27070288" "cg16715186" "cg00696044" "cg00084271" "cg24883219" "cg02627240" "cg20673830" "cg08788093" "cg07951602" "cg02389264" "cg15586958" "cg01153376" "cg15600437"
## [166] "cg23352245" "cg22542451" "cg10701746" "cg17386240" "cg11540596" "cg22666875" "cg04156077" "cg23177161" "cg00648024" "cg10681981" "cg02668233" "cg11706829" "cg02356645" "cg00146240" "cg24307368"
## [181] "cg04497611" "cg24697433" "cg18949721" "cg07480955" "cg12556569" "cg22931151" "cg14532717" "cg13226272" "cg08584917" "cg06394820" "cg17131279" "cg07138269" "cg26081710" "cg25758034" "cg22112152"
## [196] "cg19301366" "cg00819121" "cg10091792" "cg21507367" "cg16779438" "cg14710850" "cg06118351" "cg11019791" "cg01910713" "cg22535849" "cg21757617" "cg08857872" "cg20678988" "cg16431720" "cg02887598"
## [211] "cg16858433" "cg12702014" "cg01921484" "cg00415024" "cg16338321" "cg12776173" "cg18029737" "cg02643260" "cg25712921" "cg03084184" "cg04124201" "cg01549082" "cg26948066" "cg09015880" "cg11133939"
## [226] "cg15700429" "cg25277809" "cg12421087" "cg24634455" "cg03359067" "cg02225060" "cg25169289" "cg00512739" "cg04109990" "cg13368637" "cg12279734" "cg23066280" "cg06880438" "cg10666341" "cg10240127"
## [241] "cg23432430" "cg16652920" "cg12228670" "cg19503462" "cg07028768" "cg26853071" "cg06277607" "cg11787167" "cg17296678" "cg06960717" "cg00086247" "cg09584650" "cg27272246" "cg10738049" "cg12689021"
## [256] "cg21986118" "cg20208879" "cg22741595" "cg05850457" "cg04664583" "cg09216282" "cg03982462" "cg05064044" "cg06715136" "cg20803293" "cg15501526" "cg06833284" "cg16571124" "cg07158503" "cg06371647"
## [271] "cg17671604" "cg14175932" "cg03979311" "cg15730644" "cg18819889" "cg25208881" "cg08861434" "cg04718469" "cg17002719" "cg17429539" "cg08554146" "cg00322003" "cg14170504" "cg07504457" "cg18526121"
## [286] "cg11247378" "cg05876883" "cg23517115" "cg13405878" "cg03672288" "cg18816397" "cg14687298" "cg14627380" "cg10864200" "cg00154902" "cg15098922" "cg15985500" "cg26901661" "cg10039445" "cg00004073"
## [301] "cg07634717" "cg27452255" "cg06697310" "cg02631626" "cg17129965" "cg06231502" "cg09727210" "cg18918831" "cg21243064" "cg27577781" "cg20685672" "cg03660162" "cg17042243" "DX"
print("Number of Features :")
## [1] "Number of Features :"
Num_feaForProcess = length(AfterProcess_FeatureName)-1 # exclude the "DX" label
print(Num_feaForProcess) 
## [1] 313

2. Logistic Regression Model

2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)

set.seed(123)  # for reproducibility
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 454 314
dim(testData)
## [1] 194 314
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_modelTrain_LRM1 <- caret::confusionMatrix(predictions, testData$DX)

print(cm_modelTrain_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 123  19
##         CN   5  47
##                                           
##                Accuracy : 0.8763          
##                  95% CI : (0.8215, 0.9191)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 4.641e-12       
##                                           
##                   Kappa : 0.7095          
##                                           
##  Mcnemar's Test P-Value : 0.007963        
##                                           
##             Sensitivity : 0.9609          
##             Specificity : 0.7121          
##          Pos Pred Value : 0.8662          
##          Neg Pred Value : 0.9038          
##              Prevalence : 0.6598          
##          Detection Rate : 0.6340          
##    Detection Prevalence : 0.7320          
##       Balanced Accuracy : 0.8365          
##                                           
##        'Positive' Class : CI              
## 
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_modelTrain_LRM1_Accuracy<-cm_modelTrain_LRM1$overall["Accuracy"]
cm_modelTrain_LRM1_Kappa<-cm_modelTrain_LRM1$overall["Kappa"]
print(cm_modelTrain_LRM1_Accuracy)
##  Accuracy 
## 0.8762887
print(cm_modelTrain_LRM1_Kappa)
##     Kappa 
## 0.7095084
print(model_LRM1)
## glmnet 
## 
## 454 samples
## 313 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0001769938  0.7599023  0.4526141
##   0.10   0.0017699384  0.7686691  0.4684706
##   0.10   0.0176993845  0.7707937  0.4674205
##   0.55   0.0001769938  0.7466911  0.4270706
##   0.55   0.0017699384  0.7334554  0.4012873
##   0.55   0.0176993845  0.6958974  0.2927531
##   1.00   0.0001769938  0.7158730  0.3658615
##   1.00   0.0017699384  0.7202198  0.3706121
##   1.00   0.0176993845  0.6541392  0.1764299
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01769938.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.997797356828194"
modelTrain_LRM1_trainAccuracy<-train_accuracy

print(modelTrain_LRM1_trainAccuracy)
## [1] 0.9977974
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
modelTrain_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(modelTrain_mean_accuracy_cv_LRM1)
## [1] 0.7295157
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
 
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)
  modelTrain_LRM1_AUC <- auc_value


  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6 ){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
 
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)
  modelTrain_LRM1_AUC <- auc_value


  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
 
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)
  modelTrain_LRM1_AUC <- auc_value


  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9149
## [1] "The auc value is:"
## Area under the curve: 0.9149

if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_LRM1_AUC <- mean_auc
}
print(modelTrain_LRM1_AUC)
## Area under the curve: 0.9149
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 313)
## 
##            Overall
## PC3         100.00
## PC1          66.84
## cg23432430   49.20
## cg09727210   47.85
## PC2          43.08
## cg00962106   41.16
## cg07158503   40.42
## cg06697310   40.25
## cg02225060   35.49
## cg09015880   35.48
## cg10701746   34.84
## cg16338321   33.99
## cg00819121   32.46
## cg26081710   32.36
## cg00415024   31.28
## cg21757617   30.74
## cg14168080   30.58
## cg02887598   29.89
## cg05064044   29.82
## cg01910713   28.88
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)

ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
##          Overall
## 1   3.6285180181
## 2   2.4253396315
## 3   1.7851949468
## 4   1.7363184604
## 5   1.5632207146
## 6   1.4936497094
## 7   1.4664812319
## 8   1.4603980581
## 9   1.2878873558
## 10  1.2875537792
## 11  1.2641672553
## 12  1.2333618130
## 13  1.1778382769
## 14  1.1740457452
## 15  1.1348195426
## 16  1.1153125101
## 17  1.1097799080
## 18  1.0847220785
## 19  1.0818839266
## 20  1.0478938648
## 21  1.0235490351
## 22  0.9939673711
## 23  0.9898860123
## 24  0.9884952187
## 25  0.9773395833
## 26  0.9744874223
## 27  0.9724050409
## 28  0.9549356376
## 29  0.9440677374
## 30  0.9287982221
## 31  0.9219776406
## 32  0.9111355524
## 33  0.9016094080
## 34  0.8934737229
## 35  0.8860340273
## 36  0.8860249092
## 37  0.8838850611
## 38  0.8803408500
## 39  0.8800125946
## 40  0.8718597919
## 41  0.8577624329
## 42  0.8567848192
## 43  0.8564251541
## 44  0.8453990008
## 45  0.8121199754
## 46  0.8112721045
## 47  0.8111852845
## 48  0.8030267999
## 49  0.7975741288
## 50  0.7843676142
## 51  0.7836162409
## 52  0.7831374337
## 53  0.7813426107
## 54  0.7733647514
## 55  0.7673942532
## 56  0.7660677097
## 57  0.7624088722
## 58  0.7588005343
## 59  0.7539736771
## 60  0.7469764730
## 61  0.7405239848
## 62  0.7393544366
## 63  0.7392058659
## 64  0.7354194153
## 65  0.7330058682
## 66  0.7244382375
## 67  0.7155997383
## 68  0.7137847718
## 69  0.7127896206
## 70  0.7114462469
## 71  0.7074761994
## 72  0.7027883523
## 73  0.7018471550
## 74  0.7001187138
## 75  0.6951345401
## 76  0.6920625399
## 77  0.6900427180
## 78  0.6840843379
## 79  0.6830927769
## 80  0.6727313853
## 81  0.6709406295
## 82  0.6698572659
## 83  0.6690248966
## 84  0.6636477129
## 85  0.6613941167
## 86  0.6585728678
## 87  0.6540525980
## 88  0.6536478883
## 89  0.6462598537
## 90  0.6418569758
## 91  0.6399370459
## 92  0.6389031042
## 93  0.6313143750
## 94  0.6185535294
## 95  0.6155716177
## 96  0.6132356206
## 97  0.5954230466
## 98  0.5796252549
## 99  0.5783326900
## 100 0.5724848914
## 101 0.5721418795
## 102 0.5712297528
## 103 0.5698630147
## 104 0.5616271697
## 105 0.5572482303
## 106 0.5562914968
## 107 0.5557055468
## 108 0.5532126081
## 109 0.5522476854
## 110 0.5486044693
## 111 0.5478097923
## 112 0.5420141433
## 113 0.5366864186
## 114 0.5339543511
## 115 0.5318657316
## 116 0.5293483054
## 117 0.5256715439
## 118 0.5200995234
## 119 0.5174148906
## 120 0.5089563626
## 121 0.5075095859
## 122 0.4958552201
## 123 0.4953552358
## 124 0.4940180438
## 125 0.4926335176
## 126 0.4873539274
## 127 0.4857238005
## 128 0.4810508313
## 129 0.4759225008
## 130 0.4721732800
## 131 0.4717239832
## 132 0.4689932973
## 133 0.4645604499
## 134 0.4643487070
## 135 0.4552771848
## 136 0.4535184957
## 137 0.4534578477
## 138 0.4520467343
## 139 0.4469529838
## 140 0.4353378524
## 141 0.4291125154
## 142 0.4094553361
## 143 0.4052509206
## 144 0.4010719173
## 145 0.3978202403
## 146 0.3913548275
## 147 0.3910537900
## 148 0.3844388051
## 149 0.3841012802
## 150 0.3786766290
## 151 0.3765864520
## 152 0.3700379275
## 153 0.3689803068
## 154 0.3687736920
## 155 0.3623862375
## 156 0.3608124030
## 157 0.3564715816
## 158 0.3555823088
## 159 0.3555498481
## 160 0.3536609481
## 161 0.3529640491
## 162 0.3512288024
## 163 0.3507942437
## 164 0.3495106419
## 165 0.3471227963
## 166 0.3467133316
## 167 0.3451423838
## 168 0.3392448510
## 169 0.3369606343
## 170 0.3357596603
## 171 0.3345611286
## 172 0.3278947697
## 173 0.3249097890
## 174 0.3214712831
## 175 0.3167455763
## 176 0.3154459662
## 177 0.3137431181
## 178 0.3129416136
## 179 0.3090541860
## 180 0.3069833919
## 181 0.3067290175
## 182 0.3018606811
## 183 0.3018152722
## 184 0.3013511367
## 185 0.2992311366
## 186 0.2992306420
## 187 0.2990332615
## 188 0.2957057531
## 189 0.2950066591
## 190 0.2949070039
## 191 0.2897976460
## 192 0.2893166430
## 193 0.2848487367
## 194 0.2781623771
## 195 0.2775592833
## 196 0.2771259031
## 197 0.2727963026
## 198 0.2719936801
## 199 0.2719389468
## 200 0.2705848712
## 201 0.2625679457
## 202 0.2513507031
## 203 0.2468680134
## 204 0.2468009208
## 205 0.2464045331
## 206 0.2459980563
## 207 0.2450402791
## 208 0.2443706887
## 209 0.2404726263
## 210 0.2394730382
## 211 0.2374227727
## 212 0.2337007113
## 213 0.2278785971
## 214 0.2255178628
## 215 0.2174820406
## 216 0.2159192011
## 217 0.2102473439
## 218 0.2094626307
## 219 0.2068974424
## 220 0.2044396709
## 221 0.1995881338
## 222 0.1981666948
## 223 0.1951834518
## 224 0.1911523555
## 225 0.1911516149
## 226 0.1888215185
## 227 0.1798996171
## 228 0.1690364654
## 229 0.1685561829
## 230 0.1608443544
## 231 0.1529116121
## 232 0.1498650620
## 233 0.1497094974
## 234 0.1442980007
## 235 0.1380567582
## 236 0.1369752449
## 237 0.1309525555
## 238 0.1230829815
## 239 0.1196183758
## 240 0.1182549257
## 241 0.1154546922
## 242 0.1113444991
## 243 0.1052551773
## 244 0.1000973127
## 245 0.0944326736
## 246 0.0908392813
## 247 0.0873857932
## 248 0.0863614047
## 249 0.0853802297
## 250 0.0846494477
## 251 0.0788392565
## 252 0.0756001439
## 253 0.0728942019
## 254 0.0706347376
## 255 0.0624371279
## 256 0.0610645500
## 257 0.0603275492
## 258 0.0539276616
## 259 0.0507301074
## 260 0.0492005247
## 261 0.0490112909
## 262 0.0432635601
## 263 0.0361110507
## 264 0.0267341382
## 265 0.0260123548
## 266 0.0196630311
## 267 0.0179235728
## 268 0.0151071529
## 269 0.0149982730
## 270 0.0008666233
## 271 0.0002513851
## 272 0.0000000000
## 273 0.0000000000
## 274 0.0000000000
## 275 0.0000000000
## 276 0.0000000000
## 277 0.0000000000
## 278 0.0000000000
## 279 0.0000000000
## 280 0.0000000000
## 281 0.0000000000
## 282 0.0000000000
## 283 0.0000000000
## 284 0.0000000000
## 285 0.0000000000
## 286 0.0000000000
## 287 0.0000000000
## 288 0.0000000000
## 289 0.0000000000
## 290 0.0000000000
## 291 0.0000000000
## 292 0.0000000000
## 293 0.0000000000
## 294 0.0000000000
## 295 0.0000000000
## 296 0.0000000000
## 297 0.0000000000
## 298 0.0000000000
## 299 0.0000000000
## 300 0.0000000000
## 301 0.0000000000
## 302 0.0000000000
## 303 0.0000000000
## 304 0.0000000000
## 305 0.0000000000
## 306 0.0000000000
## 307 0.0000000000
## 308 0.0000000000
## 309 0.0000000000
## 310 0.0000000000
## 311 0.0000000000
## 312 0.0000000000
## 313 0.0000000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}

2.2 Model Diagnose & Improve

2.2.1 Class imbalance

Class imbalance Check

  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##  CI  CN 
## 427 221
prop.table(table(df_LRM1$DX))
## 
##        CI        CN 
## 0.6589506 0.3410494
table(trainData$DX)
## 
##  CI  CN 
## 299 155
prop.table(table(trainData$DX))
## 
##        CI        CN 
## 0.6585903 0.3414097
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 1.932127
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 1.929032
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 65.488, df = 1, p-value = 5.848e-16
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 45.674, df = 1, p-value = 1.397e-11

Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)

library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)


balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##  CI  CN 
## 299 310
dim(balanced_data_LGR_1)
## [1] 609 314

Fit Model with Balanced Data

ctrl <- trainControl(method = "cv", number = 5)


model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)


predictions <- predict(model_LRM2, newdata = testData)
cm_modelTrain_LRM2<-caret::confusionMatrix(predictions, testData$DX)
print(cm_modelTrain_LRM2)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 116  13
##         CN  12  53
##                                           
##                Accuracy : 0.8711          
##                  95% CI : (0.8157, 0.9148)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 1.656e-11       
##                                           
##                   Kappa : 0.7119          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9062          
##             Specificity : 0.8030          
##          Pos Pred Value : 0.8992          
##          Neg Pred Value : 0.8154          
##              Prevalence : 0.6598          
##          Detection Rate : 0.5979          
##    Detection Prevalence : 0.6649          
##       Balanced Accuracy : 0.8546          
##                                           
##        'Positive' Class : CI              
## 
cm_modelTrain_LRM2_Accuracy<-cm_modelTrain_LRM2$overall["Accuracy"]
cm_modelTrain_LRM2_Kappa<-cm_modelTrain_LRM2$overall["Kappa"]
print(cm_modelTrain_LRM2_Accuracy)
## Accuracy 
## 0.871134
print(cm_modelTrain_LRM2_Kappa)
##     Kappa 
## 0.7118926
print(model_LRM2)
## glmnet 
## 
## 609 samples
## 313 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 487, 487, 487, 487, 488 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       Accuracy   Kappa    
##   0.10   0.000214353  0.8735131  0.7461978
##   0.10   0.002143530  0.8767918  0.7528013
##   0.10   0.021435296  0.8784176  0.7560836
##   0.55   0.000214353  0.8702344  0.7396263
##   0.55   0.002143530  0.8685544  0.7362837
##   0.55   0.021435296  0.8340875  0.6673492
##   1.00   0.000214353  0.8521881  0.7033964
##   1.00   0.002143530  0.8521474  0.7034607
##   1.00   0.021435296  0.7749763  0.5489226
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0214353.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData$DX)

modelTrain_LRM2_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", modelTrain_LRM2_trainAccuracy))
## [1] "Training Accuracy:  1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8534345
modelTrain_LRM2_mean_accuracy_model_LRM2 <- mean_accuracy_model_LRM2
print(modelTrain_LRM2_mean_accuracy_model_LRM2)
## [1] 0.8534345
importance_model_LRM2 <- varImp(model_LRM2)

print(importance_model_LRM2)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 313)
## 
##            Overall
## PC3         100.00
## PC1          45.65
## cg23432430   35.91
## cg09727210   33.49
## PC2          32.09
## cg00962106   30.88
## cg06697310   30.37
## cg07158503   29.15
## cg10701746   27.04
## cg26081710   26.68
## cg02225060   25.91
## cg09015880   25.77
## cg21757617   25.45
## cg00819121   25.27
## cg16338321   24.73
## cg00415024   24.47
## cg07504457   24.31
## cg14168080   23.03
## cg05064044   22.62
## cg16858433   22.51
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3||METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG ==5 || METHOD_FEATURE_FLAG == 6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)  
  
}
##        Overall
## 1   4.73724133
## 2   2.16271426
## 3   1.70095331
## 4   1.58633106
## 5   1.52033522
## 6   1.46294601
## 7   1.43875267
## 8   1.38075811
## 9   1.28098848
## 10  1.26407502
## 11  1.22724256
## 12  1.22079741
## 13  1.20580648
## 14  1.19717932
## 15  1.17133288
## 16  1.15897517
## 17  1.15144927
## 18  1.09088660
## 19  1.07172638
## 20  1.06653191
## 21  1.06292852
## 22  1.05102724
## 23  1.05030025
## 24  1.04516096
## 25  1.04455791
## 26  0.95899342
## 27  0.94814248
## 28  0.94608968
## 29  0.88796191
## 30  0.88711946
## 31  0.87863590
## 32  0.87402827
## 33  0.86775178
## 34  0.85935237
## 35  0.85597533
## 36  0.85425398
## 37  0.84607299
## 38  0.84581642
## 39  0.84204289
## 40  0.83185062
## 41  0.82889154
## 42  0.82462524
## 43  0.82329947
## 44  0.81813072
## 45  0.81534099
## 46  0.81252060
## 47  0.79513692
## 48  0.78769600
## 49  0.78484822
## 50  0.78415603
## 51  0.78076190
## 52  0.77411290
## 53  0.76518315
## 54  0.76416818
## 55  0.76276530
## 56  0.75411763
## 57  0.74955508
## 58  0.73445616
## 59  0.73223254
## 60  0.73107655
## 61  0.72416752
## 62  0.72084182
## 63  0.72060032
## 64  0.70543694
## 65  0.70075670
## 66  0.69835082
## 67  0.69484967
## 68  0.69472089
## 69  0.69423967
## 70  0.68684836
## 71  0.68282615
## 72  0.67839090
## 73  0.67215113
## 74  0.67164660
## 75  0.66708092
## 76  0.66452702
## 77  0.65732869
## 78  0.65691099
## 79  0.65686015
## 80  0.65345134
## 81  0.64812490
## 82  0.64529156
## 83  0.62793156
## 84  0.62621688
## 85  0.62604028
## 86  0.62266419
## 87  0.62150591
## 88  0.62128813
## 89  0.61891936
## 90  0.61194453
## 91  0.60878408
## 92  0.60387466
## 93  0.60154656
## 94  0.58359903
## 95  0.57864923
## 96  0.57232234
## 97  0.56833104
## 98  0.56810528
## 99  0.56580366
## 100 0.56524898
## 101 0.56467932
## 102 0.55178796
## 103 0.55173585
## 104 0.54911668
## 105 0.54903540
## 106 0.53895027
## 107 0.53864115
## 108 0.53795234
## 109 0.52702710
## 110 0.52267999
## 111 0.52201886
## 112 0.52143470
## 113 0.51936595
## 114 0.51634992
## 115 0.50805783
## 116 0.50732354
## 117 0.50479499
## 118 0.50177214
## 119 0.49829177
## 120 0.49810680
## 121 0.47808654
## 122 0.47347948
## 123 0.47060002
## 124 0.46863208
## 125 0.46559761
## 126 0.46557579
## 127 0.46317453
## 128 0.45761691
## 129 0.45367635
## 130 0.44703869
## 131 0.43654716
## 132 0.43624963
## 133 0.42974751
## 134 0.42670728
## 135 0.42648637
## 136 0.42612688
## 137 0.42453003
## 138 0.41431291
## 139 0.41371600
## 140 0.41240859
## 141 0.40846532
## 142 0.40713574
## 143 0.40613719
## 144 0.40585034
## 145 0.39942215
## 146 0.39588614
## 147 0.38936467
## 148 0.38511241
## 149 0.38192429
## 150 0.38188537
## 151 0.38034345
## 152 0.37812496
## 153 0.37541929
## 154 0.36892460
## 155 0.36796685
## 156 0.36778911
## 157 0.36539402
## 158 0.36470955
## 159 0.36074853
## 160 0.35954558
## 161 0.35507557
## 162 0.35368515
## 163 0.35319532
## 164 0.34532470
## 165 0.34473980
## 166 0.34219149
## 167 0.34180751
## 168 0.34158630
## 169 0.33964989
## 170 0.33710886
## 171 0.33657979
## 172 0.32655735
## 173 0.32221193
## 174 0.32206817
## 175 0.32127432
## 176 0.31166384
## 177 0.31145825
## 178 0.29637860
## 179 0.29308470
## 180 0.29267086
## 181 0.28495199
## 182 0.28245104
## 183 0.27704325
## 184 0.27198625
## 185 0.27142982
## 186 0.26967234
## 187 0.26940665
## 188 0.26854006
## 189 0.25804035
## 190 0.25573845
## 191 0.25567168
## 192 0.25519600
## 193 0.25515399
## 194 0.25459159
## 195 0.25358671
## 196 0.24805594
## 197 0.24746774
## 198 0.24458430
## 199 0.24435082
## 200 0.24295826
## 201 0.24270701
## 202 0.24236221
## 203 0.23452472
## 204 0.23375090
## 205 0.23354720
## 206 0.23339154
## 207 0.23303581
## 208 0.23294679
## 209 0.23084693
## 210 0.22940684
## 211 0.22367077
## 212 0.22269241
## 213 0.21648890
## 214 0.21429482
## 215 0.20658985
## 216 0.20017340
## 217 0.19887425
## 218 0.19620355
## 219 0.19512709
## 220 0.19455725
## 221 0.18426276
## 222 0.17877626
## 223 0.17768011
## 224 0.17521062
## 225 0.17186014
## 226 0.16105459
## 227 0.16073726
## 228 0.15505456
## 229 0.15447408
## 230 0.14791875
## 231 0.14743801
## 232 0.14633087
## 233 0.14460063
## 234 0.13244384
## 235 0.13210608
## 236 0.13143427
## 237 0.13065130
## 238 0.12943973
## 239 0.12588257
## 240 0.11497969
## 241 0.11168556
## 242 0.10856732
## 243 0.10836953
## 244 0.10223273
## 245 0.09843385
## 246 0.09719439
## 247 0.09365509
## 248 0.08249495
## 249 0.08045198
## 250 0.07985042
## 251 0.07671135
## 252 0.07494819
## 253 0.06795233
## 254 0.06208846
## 255 0.05618149
## 256 0.05180342
## 257 0.05063311
## 258 0.04426320
## 259 0.04356843
## 260 0.04020560
## 261 0.03710177
## 262 0.02989754
## 263 0.02569970
## 264 0.02278718
## 265 0.02229655
## 266 0.01688746
## 267 0.01629155
## 268 0.01408703
## 269 0.00000000
## 270 0.00000000
## 271 0.00000000
## 272 0.00000000
## 273 0.00000000
## 274 0.00000000
## 275 0.00000000
## 276 0.00000000
## 277 0.00000000
## 278 0.00000000
## 279 0.00000000
## 280 0.00000000
## 281 0.00000000
## 282 0.00000000
## 283 0.00000000
## 284 0.00000000
## 285 0.00000000
## 286 0.00000000
## 287 0.00000000
## 288 0.00000000
## 289 0.00000000
## 290 0.00000000
## 291 0.00000000
## 292 0.00000000
## 293 0.00000000
## 294 0.00000000
## 295 0.00000000
## 296 0.00000000
## 297 0.00000000
## 298 0.00000000
## 299 0.00000000
## 300 0.00000000
## 301 0.00000000
## 302 0.00000000
## 303 0.00000000
## 304 0.00000000
## 305 0.00000000
## 306 0.00000000
## 307 0.00000000
## 308 0.00000000
## 309 0.00000000
## 310 0.00000000
## 311 0.00000000
## 312 0.00000000
## 313 0.00000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM2_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  modelTrain_LRM2_AUC <-auc_value

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  modelTrain_LRM2_AUC <-auc_value

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  modelTrain_LRM2_AUC <-auc_value

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9097
## [1] "The auc value is:"
## Area under the curve: 0.9097

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_LRM2_AUC <-mean_auc
}
print(modelTrain_LRM2_AUC)
## Area under the curve: 0.9097

3. Elastic Net

3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)

set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)

param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))

elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)

print(elastic_net_model1)
## glmnet 
## 
## 454 samples
## 313 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa     
##   0      0.00100000  0.7862515  0.50620659
##   0      0.05357895  0.7950183  0.52074555
##   0      0.10615789  0.8038584  0.53160822
##   0      0.15873684  0.8060806  0.53493479
##   0      0.21131579  0.8016606  0.52036722
##   0      0.26389474  0.7994383  0.51283285
##   0      0.31647368  0.7950427  0.49585158
##   0      0.36905263  0.7884249  0.47734282
##   0      0.42163158  0.7928694  0.48613573
##   0      0.47421053  0.7862271  0.46366075
##   0      0.52678947  0.7884249  0.46671213
##   0      0.57936842  0.7818071  0.44556438
##   0      0.63194737  0.7818315  0.44299981
##   0      0.68452632  0.7752381  0.42142882
##   0      0.73710526  0.7774603  0.42422765
##   0      0.78968421  0.7730647  0.41122351
##   0      0.84226316  0.7708669  0.40431346
##   0      0.89484211  0.7664713  0.39134080
##   0      0.94742105  0.7576801  0.36415459
##   0      1.00000000  0.7444444  0.32221545
##   1      0.00100000  0.7224420  0.37390194
##   1      0.05357895  0.6564103  0.01558753
##   1      0.10615789  0.6585836  0.00000000
##   1      0.15873684  0.6585836  0.00000000
##   1      0.21131579  0.6585836  0.00000000
##   1      0.26389474  0.6585836  0.00000000
##   1      0.31647368  0.6585836  0.00000000
##   1      0.36905263  0.6585836  0.00000000
##   1      0.42163158  0.6585836  0.00000000
##   1      0.47421053  0.6585836  0.00000000
##   1      0.52678947  0.6585836  0.00000000
##   1      0.57936842  0.6585836  0.00000000
##   1      0.63194737  0.6585836  0.00000000
##   1      0.68452632  0.6585836  0.00000000
##   1      0.73710526  0.6585836  0.00000000
##   1      0.78968421  0.6585836  0.00000000
##   1      0.84226316  0.6585836  0.00000000
##   1      0.89484211  0.6585836  0.00000000
##   1      0.94742105  0.6585836  0.00000000
##   1      1.00000000  0.6585836  0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.1587368.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.722638
modelTrain_mean_accuracy_cv_ENM1 <- mean_accuracy_elastic_net_model1
print(modelTrain_mean_accuracy_cv_ENM1)
## [1] 0.722638
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData_ENM1$DX)

modelTrain_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.973568281938326"
print(modelTrain_ENM1_trainAccuracy)
## [1] 0.9735683
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_modelTrain_ENM1<- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_modelTrain_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 126  18
##         CN   2  48
##                                           
##                Accuracy : 0.8969          
##                  95% CI : (0.8453, 0.9359)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 1.772e-14       
##                                           
##                   Kappa : 0.756           
##                                           
##  Mcnemar's Test P-Value : 0.0007962       
##                                           
##             Sensitivity : 0.9844          
##             Specificity : 0.7273          
##          Pos Pred Value : 0.8750          
##          Neg Pred Value : 0.9600          
##              Prevalence : 0.6598          
##          Detection Rate : 0.6495          
##    Detection Prevalence : 0.7423          
##       Balanced Accuracy : 0.8558          
##                                           
##        'Positive' Class : CI              
## 
cm_modelTrain_ENM1_Accuracy <- cm_modelTrain_ENM1$overall["Accuracy"]
print(cm_modelTrain_ENM1_Accuracy)
##  Accuracy 
## 0.8969072
cm_modelTrain_ENM1_Kappa <- cm_modelTrain_ENM1$overall["Kappa"]
print(cm_modelTrain_ENM1_Kappa)
##     Kappa 
## 0.7560362
importance_elastic_net_model1<- varImp(elastic_net_model1)


print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 313)
## 
##            Overall
## PC3         100.00
## PC2          84.53
## PC1          75.99
## cg23432430   62.27
## cg00962106   52.02
## cg07158503   51.14
## cg06697310   49.97
## cg09727210   48.46
## cg02225060   47.85
## cg06277607   42.73
## cg16338321   42.67
## cg26081710   40.29
## cg21757617   39.97
## cg27272246   38.53
## cg09015880   38.02
## cg00819121   37.58
## cg02887598   37.38
## cg05064044   37.05
## cg00004073   36.87
## cg17429539   36.87
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 ||METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)

Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))

print(Ordered_importance_elastic_net_final_model1) 
  
}
##         Overall
## 1   1.175115079
## 2   0.993654080
## 3   0.893532902
## 4   0.732676077
## 5   0.612474354
## 6   0.602081072
## 7   0.588338283
## 8   0.570704740
## 9   0.563482510
## 10  0.503541109
## 11  0.502798118
## 12  0.474902373
## 13  0.471083184
## 14  0.454211128
## 15  0.448210568
## 16  0.443055674
## 17  0.440722143
## 18  0.436918720
## 19  0.434818554
## 20  0.434715737
## 21  0.434680710
## 22  0.433972215
## 23  0.430411586
## 24  0.427929924
## 25  0.422215493
## 26  0.422198375
## 27  0.419095951
## 28  0.416391897
## 29  0.410552311
## 30  0.409248352
## 31  0.406698264
## 32  0.404806291
## 33  0.402903699
## 34  0.400805246
## 35  0.394486618
## 36  0.390668734
## 37  0.389900967
## 38  0.389340890
## 39  0.387621502
## 40  0.385004177
## 41  0.380935907
## 42  0.379773537
## 43  0.377186104
## 44  0.376027600
## 45  0.375109969
## 46  0.373930293
## 47  0.372686038
## 48  0.372235546
## 49  0.370924728
## 50  0.370576575
## 51  0.367976075
## 52  0.366223885
## 53  0.364300270
## 54  0.360907884
## 55  0.360046674
## 56  0.359431713
## 57  0.358367717
## 58  0.354059041
## 59  0.349875077
## 60  0.348220588
## 61  0.347558794
## 62  0.346970910
## 63  0.343088621
## 64  0.342627997
## 65  0.342170666
## 66  0.339101895
## 67  0.338357484
## 68  0.337835302
## 69  0.336324945
## 70  0.335963341
## 71  0.333872881
## 72  0.331817569
## 73  0.331371663
## 74  0.330942610
## 75  0.328639393
## 76  0.323771410
## 77  0.322292263
## 78  0.319676434
## 79  0.319544642
## 80  0.318308210
## 81  0.317808720
## 82  0.316732131
## 83  0.315560112
## 84  0.314822802
## 85  0.313529934
## 86  0.313502513
## 87  0.313123191
## 88  0.312863194
## 89  0.310627304
## 90  0.308590929
## 91  0.307962148
## 92  0.305055661
## 93  0.304979611
## 94  0.295482246
## 95  0.294603886
## 96  0.293745021
## 97  0.292779535
## 98  0.292182519
## 99  0.290525042
## 100 0.290328174
## 101 0.289918614
## 102 0.289438703
## 103 0.287572777
## 104 0.286764062
## 105 0.286579341
## 106 0.284952314
## 107 0.284593118
## 108 0.283601627
## 109 0.281920938
## 110 0.280832312
## 111 0.280811790
## 112 0.279237741
## 113 0.277767873
## 114 0.275541019
## 115 0.275374541
## 116 0.274396251
## 117 0.272194143
## 118 0.272080417
## 119 0.267135990
## 120 0.266712998
## 121 0.264778461
## 122 0.263257295
## 123 0.260048087
## 124 0.257631351
## 125 0.257106538
## 126 0.256870433
## 127 0.256252867
## 128 0.254441406
## 129 0.252879288
## 130 0.249836435
## 131 0.249457861
## 132 0.248673243
## 133 0.246920121
## 134 0.246537427
## 135 0.245847303
## 136 0.245730549
## 137 0.244117616
## 138 0.242509172
## 139 0.241886236
## 140 0.241358458
## 141 0.237364368
## 142 0.235978103
## 143 0.233925841
## 144 0.233586316
## 145 0.231610922
## 146 0.230528569
## 147 0.228792860
## 148 0.228302971
## 149 0.227471071
## 150 0.225862006
## 151 0.224066021
## 152 0.223866878
## 153 0.223218772
## 154 0.223133707
## 155 0.221793118
## 156 0.221349475
## 157 0.220952803
## 158 0.220182381
## 159 0.219184432
## 160 0.218269681
## 161 0.215903421
## 162 0.213164979
## 163 0.212470769
## 164 0.211823176
## 165 0.210241885
## 166 0.209718601
## 167 0.209499987
## 168 0.209444836
## 169 0.207098152
## 170 0.206187830
## 171 0.204908065
## 172 0.204676885
## 173 0.204514407
## 174 0.204180849
## 175 0.203066027
## 176 0.202829795
## 177 0.202823903
## 178 0.201434203
## 179 0.201361120
## 180 0.200291628
## 181 0.199515674
## 182 0.198575683
## 183 0.198554091
## 184 0.195497345
## 185 0.195345198
## 186 0.194356872
## 187 0.193133432
## 188 0.191480434
## 189 0.189610131
## 190 0.189478770
## 191 0.188702000
## 192 0.188663065
## 193 0.187063414
## 194 0.184895132
## 195 0.184647013
## 196 0.184356035
## 197 0.182974794
## 198 0.181193135
## 199 0.179552584
## 200 0.179143297
## 201 0.177891278
## 202 0.175960746
## 203 0.175266287
## 204 0.174684681
## 205 0.174609454
## 206 0.173276914
## 207 0.172729225
## 208 0.167349512
## 209 0.165650393
## 210 0.163894082
## 211 0.163135971
## 212 0.162721369
## 213 0.159540253
## 214 0.158955036
## 215 0.157729754
## 216 0.157503697
## 217 0.157034142
## 218 0.156901777
## 219 0.156695500
## 220 0.156357021
## 221 0.151742081
## 222 0.151421928
## 223 0.151087433
## 224 0.150404632
## 225 0.149817571
## 226 0.149598423
## 227 0.145446990
## 228 0.144231745
## 229 0.143096593
## 230 0.142880919
## 231 0.141833674
## 232 0.141317619
## 233 0.140169056
## 234 0.139551793
## 235 0.138351158
## 236 0.137248579
## 237 0.136001213
## 238 0.134300553
## 239 0.134299757
## 240 0.134052000
## 241 0.133116228
## 242 0.131767425
## 243 0.131064823
## 244 0.130556693
## 245 0.130154319
## 246 0.128736660
## 247 0.127691082
## 248 0.124575154
## 249 0.124533911
## 250 0.124189821
## 251 0.121457115
## 252 0.121271345
## 253 0.120158593
## 254 0.119486520
## 255 0.115466054
## 256 0.112678325
## 257 0.112454110
## 258 0.111654603
## 259 0.110686563
## 260 0.106876769
## 261 0.104774506
## 262 0.104372371
## 263 0.104297384
## 264 0.102104015
## 265 0.101892318
## 266 0.100105442
## 267 0.099500057
## 268 0.098368171
## 269 0.096062061
## 270 0.094860869
## 271 0.091132640
## 272 0.090185789
## 273 0.089119388
## 274 0.088131999
## 275 0.087623209
## 276 0.084493797
## 277 0.084458536
## 278 0.082490173
## 279 0.082237386
## 280 0.082228018
## 281 0.080509490
## 282 0.073850360
## 283 0.070585757
## 284 0.068368220
## 285 0.065589030
## 286 0.065122020
## 287 0.063538752
## 288 0.063498784
## 289 0.063469176
## 290 0.062957120
## 291 0.061269421
## 292 0.059626725
## 293 0.058543870
## 294 0.058208552
## 295 0.054151316
## 296 0.052344472
## 297 0.052168684
## 298 0.047801344
## 299 0.043828013
## 300 0.041454820
## 301 0.036520783
## 302 0.036015991
## 303 0.027482469
## 304 0.025601429
## 305 0.021803023
## 306 0.020724048
## 307 0.015792103
## 308 0.014780493
## 309 0.011997938
## 310 0.006688169
## 311 0.006359318
## 312 0.005485135
## 313 0.002381055
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_elastic_net_model1_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_ENM1_AUC <-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG ==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_ENM1_AUC <-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_ENM1_AUC <-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## Area under the curve: 0.9458

if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_ENM1_AUC <-mean_auc
}
print(modelTrain_ENM1_AUC)
## Area under the curve: 0.9458

4. XGBoost

4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
# Start point of parallel processing
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 454 samples
## 313 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa       
##   0.3  1          0.6               0.50        50      0.6012210   0.018240514
##   0.3  1          0.6               0.50       100      0.6055678   0.058191426
##   0.3  1          0.6               0.50       150      0.6320879   0.142055824
##   0.3  1          0.6               0.75        50      0.6409524   0.078806663
##   0.3  1          0.6               0.75       100      0.6410012   0.118200965
##   0.3  1          0.6               0.75       150      0.6541880   0.137896928
##   0.3  1          0.6               1.00        50      0.5880342  -0.070374087
##   0.3  1          0.6               1.00       100      0.6035653   0.006021368
##   0.3  1          0.6               1.00       150      0.6144567   0.039738408
##   0.3  1          0.8               0.50        50      0.6124542   0.066273466
##   0.3  1          0.8               0.50       100      0.6586325   0.181792922
##   0.3  1          0.8               0.50       150      0.6828083   0.242903429
##   0.3  1          0.8               0.75        50      0.6012698  -0.001435643
##   0.3  1          0.8               0.75       100      0.6276679   0.078829544
##   0.3  1          0.8               0.75       150      0.6232479   0.084532971
##   0.3  1          0.8               1.00        50      0.6079365  -0.029532168
##   0.3  1          0.8               1.00       100      0.6189988   0.032152110
##   0.3  1          0.8               1.00       150      0.6079853   0.032642931
##   0.3  2          0.6               0.50        50      0.6475946   0.131147712
##   0.3  2          0.6               0.50       100      0.6432479   0.102854477
##   0.3  2          0.6               0.50       150      0.6586569   0.143202170
##   0.3  2          0.6               0.75        50      0.6166789   0.052627786
##   0.3  2          0.6               0.75       100      0.6299145   0.070648048
##   0.3  2          0.6               0.75       150      0.6100366   0.034797450
##   0.3  2          0.6               1.00        50      0.6056410   0.004372945
##   0.3  2          0.6               1.00       100      0.6144078   0.040099634
##   0.3  2          0.6               1.00       150      0.6342857   0.070997992
##   0.3  2          0.8               0.50        50      0.6322100   0.101269091
##   0.3  2          0.8               0.50       100      0.6674237   0.211918172
##   0.3  2          0.8               0.50       150      0.6718926   0.217145929
##   0.3  2          0.8               0.75        50      0.6079121   0.017187490
##   0.3  2          0.8               0.75       100      0.6124054   0.016675210
##   0.3  2          0.8               0.75       150      0.6168498   0.026072802
##   0.3  2          0.8               1.00        50      0.6123321   0.013856988
##   0.3  2          0.8               1.00       100      0.6321123   0.068676845
##   0.3  2          0.8               1.00       150      0.6475458   0.109937123
##   0.3  3          0.6               0.50        50      0.6786569   0.205671560
##   0.3  3          0.6               0.50       100      0.6829792   0.222394030
##   0.3  3          0.6               0.50       150      0.7072283   0.271933929
##   0.3  3          0.6               0.75        50      0.6475702   0.123189329
##   0.3  3          0.6               0.75       100      0.6497436   0.127721397
##   0.3  3          0.6               0.75       150      0.6607326   0.159664842
##   0.3  3          0.6               1.00        50      0.6321856   0.043792614
##   0.3  3          0.6               1.00       100      0.6476190   0.085904571
##   0.3  3          0.6               1.00       150      0.6431990   0.081282761
##   0.3  3          0.8               0.50        50      0.6828327   0.215983183
##   0.3  3          0.8               0.50       100      0.6871795   0.222096767
##   0.3  3          0.8               0.50       150      0.6981685   0.255666694
##   0.3  3          0.8               0.75        50      0.6389011   0.087659333
##   0.3  3          0.8               0.75       100      0.6564835   0.137873309
##   0.3  3          0.8               0.75       150      0.6674725   0.169998819
##   0.3  3          0.8               1.00        50      0.6520147   0.093178940
##   0.3  3          0.8               1.00       100      0.6674725   0.139474684
##   0.3  3          0.8               1.00       150      0.6652503   0.131307036
##   0.4  1          0.6               0.50        50      0.6167277   0.082109043
##   0.4  1          0.6               0.50       100      0.6409280   0.146697270
##   0.4  1          0.6               0.50       150      0.6519170   0.171609925
##   0.4  1          0.6               0.75        50      0.6012698   0.033591049
##   0.4  1          0.6               0.75       100      0.6365812   0.124706091
##   0.4  1          0.6               0.75       150      0.6365812   0.140354883
##   0.4  1          0.6               1.00        50      0.5947009  -0.036670975
##   0.4  1          0.6               1.00       100      0.6100611   0.019710601
##   0.4  1          0.6               1.00       150      0.6078632   0.028453638
##   0.4  1          0.8               0.50        50      0.6497436   0.143903004
##   0.4  1          0.8               0.50       100      0.6628816   0.183617609
##   0.4  1          0.8               0.50       150      0.6540904   0.181250729
##   0.4  1          0.8               0.75        50      0.5858852   0.002546459
##   0.4  1          0.8               0.75       100      0.6410256   0.133937136
##   0.4  1          0.8               0.75       150      0.6607570   0.181258526
##   0.4  1          0.8               1.00        50      0.5902076  -0.032106357
##   0.4  1          0.8               1.00       100      0.6100611   0.023006259
##   0.4  1          0.8               1.00       150      0.6122589   0.047670623
##   0.4  2          0.6               0.50        50      0.6608059   0.195170043
##   0.4  2          0.6               0.50       100      0.6652259   0.185859827
##   0.4  2          0.6               0.50       150      0.6740904   0.206642880
##   0.4  2          0.6               0.75        50      0.6055922   0.032593241
##   0.4  2          0.6               0.75       100      0.6144811   0.064984931
##   0.4  2          0.6               0.75       150      0.6210745   0.076291889
##   0.4  2          0.6               1.00        50      0.6408791   0.082904404
##   0.4  2          0.6               1.00       100      0.6299878   0.058501094
##   0.4  2          0.6               1.00       150      0.6322100   0.072710378
##   0.4  2          0.8               0.50        50      0.6432479   0.151942575
##   0.4  2          0.8               0.50       100      0.6697436   0.189291268
##   0.4  2          0.8               0.50       150      0.6741636   0.211667595
##   0.4  2          0.8               0.75        50      0.6432479   0.123133613
##   0.4  2          0.8               0.75       100      0.6586569   0.162919635
##   0.4  2          0.8               0.75       150      0.6674481   0.184550029
##   0.4  2          0.8               1.00        50      0.6431746   0.096013783
##   0.4  2          0.8               1.00       100      0.6542125   0.108411031
##   0.4  2          0.8               1.00       150      0.6366056   0.069966742
##   0.4  3          0.6               0.50        50      0.6276679   0.082881544
##   0.4  3          0.6               0.50       100      0.6519658   0.142956134
##   0.4  3          0.6               0.50       150      0.6629792   0.176451936
##   0.4  3          0.6               0.75        50      0.6365812   0.090716228
##   0.4  3          0.6               0.75       100      0.6277167   0.069989364
##   0.4  3          0.6               0.75       150      0.6365568   0.088516395
##   0.4  3          0.6               1.00        50      0.6431746   0.087524193
##   0.4  3          0.6               1.00       100      0.6300122   0.046638358
##   0.4  3          0.6               1.00       150      0.6322344   0.058269839
##   0.4  3          0.8               0.50        50      0.6079853   0.039012378
##   0.4  3          0.8               0.50       100      0.6210989   0.070892718
##   0.4  3          0.8               0.50       150      0.6320879   0.102593312
##   0.4  3          0.8               0.75        50      0.6543346   0.115610534
##   0.4  3          0.8               0.75       100      0.6631746   0.140095927
##   0.4  3          0.8               0.75       150      0.6609280   0.130920788
##   0.4  3          0.8               1.00        50      0.6298901   0.063740487
##   0.4  3          0.8               1.00       100      0.6277656   0.041438805
##   0.4  3          0.8               1.00       150      0.6277411   0.048692065
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.3, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.6381592
modelTrain_mean_accuracy_cv_xgb <- mean_accuracy_xgb_model
print(modelTrain_mean_accuracy_cv_xgb)
## [1] 0.6381592
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)

modelTrain_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", modelTrain_xgb_trainAccuracy))
## [1] "Training Accuracy:  1"
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_modelTrain_xgb <- caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_modelTrain_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 120  40
##         CN   8  26
##                                           
##                Accuracy : 0.7526          
##                  95% CI : (0.6857, 0.8116)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 0.003346        
##                                           
##                   Kappa : 0.3755          
##                                           
##  Mcnemar's Test P-Value : 7.66e-06        
##                                           
##             Sensitivity : 0.9375          
##             Specificity : 0.3939          
##          Pos Pred Value : 0.7500          
##          Neg Pred Value : 0.7647          
##              Prevalence : 0.6598          
##          Detection Rate : 0.6186          
##    Detection Prevalence : 0.8247          
##       Balanced Accuracy : 0.6657          
##                                           
##        'Positive' Class : CI              
## 
cm_modelTrain_xgb_Accuracy <- cm_modelTrain_xgb$overall["Accuracy"]
cm_modelTrain_xgb_Kappa <- cm_modelTrain_xgb$overall["Kappa"]
print(cm_modelTrain_xgb_Accuracy)
##  Accuracy 
## 0.7525773
print(cm_modelTrain_xgb_Kappa)
##     Kappa 
## 0.3755365
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 313)
## 
##            Overall
## cg23432430  100.00
## age.now      63.50
## cg11438323   61.99
## cg11540596   57.57
## cg03660162   52.02
## cg17002719   45.38
## cg09120722   43.33
## cg17002338   43.12
## cg07158503   42.32
## cg14168080   41.50
## cg11227702   41.48
## cg07634717   41.15
## cg18816397   40.96
## cg02122327   40.53
## cg19799454   40.13
## cg13573375   39.19
## cg03088219   38.67
## PC2          38.04
## cg20678988   37.49
## cg11019791   37.10
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##         Feature         Gain        Cover   Frequency   Importance
##          <char>        <num>        <num>       <num>        <num>
##   1: cg23432430 3.027514e-02 0.0257936980 0.012216405 3.027514e-02
##   2:    age.now 1.922394e-02 0.0178941206 0.012216405 1.922394e-02
##   3: cg11438323 1.876836e-02 0.0148479474 0.005235602 1.876836e-02
##   4: cg11540596 1.742898e-02 0.0129093067 0.013961606 1.742898e-02
##   5: cg03660162 1.574834e-02 0.0084935366 0.008726003 1.574834e-02
##  ---                                                              
## 248: cg25649515 8.420077e-05 0.0007715400 0.001745201 8.420077e-05
## 249: cg07028768 8.255277e-05 0.0005230628 0.001745201 8.255277e-05
## 250: cg27224751 6.555581e-05 0.0004160068 0.001745201 6.555581e-05
## 251: cg18819889 6.258985e-05 0.0004043872 0.001745201 6.258985e-05
## 252: cg12776173 2.085415e-05 0.0004195156 0.001745201 2.085415e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_xgb_AUC<-auc_value
  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_xgb_AUC<-auc_value
  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  modelTrain_xgb_AUC<-auc_value
  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.7603

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    
    modelTrain_xgb_AUC<-mean_auc
}
print(modelTrain_xgb_AUC)
## Area under the curve: 0.7603

5. Random Forest

5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)

print(rf_model)
## Random Forest 
## 
## 454 samples
## 313 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa      
##     2   0.6608059  0.008374734
##   157   0.6674481  0.045143239
##   313   0.6630281  0.039527981
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 157.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
modelTrain_mean_accuracy_cv_rf <- mean_accuracy_rf_model
print(modelTrain_mean_accuracy_cv_rf)
## [1] 0.6637607
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")

train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
modelTrain_rf_trainAccuracy <- train_accuracy
print(modelTrain_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_modelTrain_rf <- caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_modelTrain_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 127  63
##         CN   1   3
##                                           
##                Accuracy : 0.6701          
##                  95% CI : (0.5991, 0.7358)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 0.4131          
##                                           
##                   Kappa : 0.0487          
##                                           
##  Mcnemar's Test P-Value : 2.44e-14        
##                                           
##             Sensitivity : 0.99219         
##             Specificity : 0.04545         
##          Pos Pred Value : 0.66842         
##          Neg Pred Value : 0.75000         
##              Prevalence : 0.65979         
##          Detection Rate : 0.65464         
##    Detection Prevalence : 0.97938         
##       Balanced Accuracy : 0.51882         
##                                           
##        'Positive' Class : CI              
## 
cm_modelTrain_rf_Accuracy <- cm_modelTrain_rf$overall["Accuracy"]
cm_modelTrain_rf_Kappa <- cm_modelTrain_rf$overall["Kappa"]
print(cm_modelTrain_rf_Accuracy)
##  Accuracy 
## 0.6701031
print(cm_modelTrain_rf_Kappa)
##      Kappa 
## 0.04872816
importance_rf_model <- varImp(rf_model)

print(importance_rf_model)
## rf variable importance
## 
##   only 20 most important variables shown (out of 313)
## 
##            Importance
## cg23432430     100.00
## cg11019791      74.47
## cg03749159      72.88
## cg11331837      70.13
## cg21697769      67.79
## cg01008088      67.13
## cg04768387      66.18
## cg16431720      63.43
## cg00415024      62.26
## cg12784167      62.17
## cg23159970      61.95
## cg24851651      59.58
## cg17042243      59.56
## cg09451339      57.62
## PC3             55.39
## cg17386240      54.93
## PC2             54.69
## cg04109990      54.52
## cg06697310      54.35
## cg14192979      54.33
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==3){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))

print(Ordered_importance_rf_final_model)
  
}
##               CI           CN
## 1    4.533646382  4.533646382
## 2    2.826616756  2.826616756
## 3    2.720567988  2.720567988
## 4    2.536590584  2.536590584
## 5    2.380574060  2.380574060
## 6    2.335993233  2.335993233
## 7    2.272574779  2.272574779
## 8    2.088853264  2.088853264
## 9    2.010312547  2.010312547
## 10   2.004291812  2.004291812
## 11   1.989529692  1.989529692
## 12   1.831215081  1.831215081
## 13   1.830294344  1.830294344
## 14   1.700345913  1.700345913
## 15   1.551390508  1.551390508
## 16   1.520257908  1.520257908
## 17   1.504252329  1.504252329
## 18   1.493185873  1.493185873
## 19   1.481670227  1.481670227
## 20   1.480408410  1.480408410
## 21   1.446962190  1.446962190
## 22   1.436662424  1.436662424
## 23   1.419819595  1.419819595
## 24   1.418115344  1.418115344
## 25   1.415596947  1.415596947
## 26   1.406438429  1.406438429
## 27   1.406239440  1.406239440
## 28   1.384580003  1.384580003
## 29   1.379197755  1.379197755
## 30   1.354287084  1.354287084
## 31   1.347072236  1.347072236
## 32   1.330866999  1.330866999
## 33   1.305905015  1.305905015
## 34   1.303181565  1.303181565
## 35   1.235686518  1.235686518
## 36   1.227521734  1.227521734
## 37   1.203469750  1.203469750
## 38   1.199763791  1.199763791
## 39   1.195626994  1.195626994
## 40   1.183024512  1.183024512
## 41   1.175732698  1.175732698
## 42   1.174570250  1.174570250
## 43   1.162881540  1.162881540
## 44   1.158727441  1.158727441
## 45   1.144242298  1.144242298
## 46   1.134040359  1.134040359
## 47   1.097663038  1.097663038
## 48   1.092880677  1.092880677
## 49   1.071866003  1.071866003
## 50   1.060868329  1.060868329
## 51   1.046664195  1.046664195
## 52   1.014913250  1.014913250
## 53   1.006734858  1.006734858
## 54   0.988089664  0.988089664
## 55   0.980175685  0.980175685
## 56   0.932884451  0.932884451
## 57   0.918548775  0.918548775
## 58   0.906548960  0.906548960
## 59   0.898209088  0.898209088
## 60   0.895690672  0.895690672
## 61   0.891856479  0.891856479
## 62   0.882704329  0.882704329
## 63   0.866465342  0.866465342
## 64   0.852447727  0.852447727
## 65   0.847507411  0.847507411
## 66   0.828755893  0.828755893
## 67   0.827252900  0.827252900
## 68   0.810969997  0.810969997
## 69   0.798024544  0.798024544
## 70   0.769381132  0.769381132
## 71   0.749023937  0.749023937
## 72   0.748578813  0.748578813
## 73   0.742609264  0.742609264
## 74   0.732379664  0.732379664
## 75   0.705973598  0.705973598
## 76   0.692837281  0.692837281
## 77   0.688560925  0.688560925
## 78   0.682577323  0.682577323
## 79   0.671669871  0.671669871
## 80   0.665601363  0.665601363
## 81   0.656754855  0.656754855
## 82   0.653073911  0.653073911
## 83   0.631601305  0.631601305
## 84   0.630051329  0.630051329
## 85   0.626039694  0.626039694
## 86   0.623817117  0.623817117
## 87   0.611441377  0.611441377
## 88   0.607582326  0.607582326
## 89   0.594048636  0.594048636
## 90   0.577933117  0.577933117
## 91   0.574326439  0.574326439
## 92   0.559595685  0.559595685
## 93   0.556572809  0.556572809
## 94   0.556147457  0.556147457
## 95   0.536805063  0.536805063
## 96   0.504705315  0.504705315
## 97   0.490069442  0.490069442
## 98   0.481136527  0.481136527
## 99   0.476896638  0.476896638
## 100  0.465291423  0.465291423
## 101  0.463892261  0.463892261
## 102  0.452323656  0.452323656
## 103  0.442123224  0.442123224
## 104  0.439028567  0.439028567
## 105  0.435511756  0.435511756
## 106  0.432195455  0.432195455
## 107  0.431651497  0.431651497
## 108  0.421444133  0.421444133
## 109  0.417795807  0.417795807
## 110  0.411368417  0.411368417
## 111  0.410144901  0.410144901
## 112  0.403400332  0.403400332
## 113  0.399414862  0.399414862
## 114  0.397659994  0.397659994
## 115  0.395029475  0.395029475
## 116  0.394376081  0.394376081
## 117  0.388612951  0.388612951
## 118  0.372748627  0.372748627
## 119  0.372673830  0.372673830
## 120  0.372109988  0.372109988
## 121  0.371503201  0.371503201
## 122  0.336151367  0.336151367
## 123  0.335277278  0.335277278
## 124  0.327339574  0.327339574
## 125  0.321047893  0.321047893
## 126  0.308623412  0.308623412
## 127  0.306688890  0.306688890
## 128  0.295877169  0.295877169
## 129  0.270232468  0.270232468
## 130  0.262897363  0.262897363
## 131  0.262601998  0.262601998
## 132  0.250614962  0.250614962
## 133  0.241536502  0.241536502
## 134  0.225148032  0.225148032
## 135  0.211258479  0.211258479
## 136  0.206389866  0.206389866
## 137  0.204852274  0.204852274
## 138  0.202647229  0.202647229
## 139  0.201403188  0.201403188
## 140  0.174170570  0.174170570
## 141  0.158201124  0.158201124
## 142  0.142416850  0.142416850
## 143  0.137118539  0.137118539
## 144  0.136623080  0.136623080
## 145  0.134657947  0.134657947
## 146  0.133488102  0.133488102
## 147  0.129153901  0.129153901
## 148  0.124195344  0.124195344
## 149  0.112348706  0.112348706
## 150  0.099693622  0.099693622
## 151  0.083949277  0.083949277
## 152  0.083292061  0.083292061
## 153  0.077731225  0.077731225
## 154  0.070108091  0.070108091
## 155  0.061671663  0.061671663
## 156  0.059739976  0.059739976
## 157  0.058419380  0.058419380
## 158  0.058104487  0.058104487
## 159  0.057315818  0.057315818
## 160  0.056149654  0.056149654
## 161  0.051495304  0.051495304
## 162  0.045485472  0.045485472
## 163  0.036623888  0.036623888
## 164  0.034518767  0.034518767
## 165  0.028672922  0.028672922
## 166  0.020221873  0.020221873
## 167  0.010445791  0.010445791
## 168 -0.000578254 -0.000578254
## 169 -0.001896650 -0.001896650
## 170 -0.020139876 -0.020139876
## 171 -0.020273501 -0.020273501
## 172 -0.025361192 -0.025361192
## 173 -0.037003575 -0.037003575
## 174 -0.046634085 -0.046634085
## 175 -0.066611546 -0.066611546
## 176 -0.073104989 -0.073104989
## 177 -0.079129409 -0.079129409
## 178 -0.081048211 -0.081048211
## 179 -0.081751132 -0.081751132
## 180 -0.085303258 -0.085303258
## 181 -0.094681467 -0.094681467
## 182 -0.096709048 -0.096709048
## 183 -0.100293586 -0.100293586
## 184 -0.108943302 -0.108943302
## 185 -0.114558840 -0.114558840
## 186 -0.125599953 -0.125599953
## 187 -0.136596143 -0.136596143
## 188 -0.143532960 -0.143532960
## 189 -0.151773324 -0.151773324
## 190 -0.156816573 -0.156816573
## 191 -0.169871042 -0.169871042
## 192 -0.175826589 -0.175826589
## 193 -0.177335726 -0.177335726
## 194 -0.193863154 -0.193863154
## 195 -0.194616547 -0.194616547
## 196 -0.194738407 -0.194738407
## 197 -0.195729745 -0.195729745
## 198 -0.198309465 -0.198309465
## 199 -0.198342963 -0.198342963
## 200 -0.202530122 -0.202530122
## 201 -0.226232415 -0.226232415
## 202 -0.227916651 -0.227916651
## 203 -0.241102317 -0.241102317
## 204 -0.244616253 -0.244616253
## 205 -0.259951584 -0.259951584
## 206 -0.272876028 -0.272876028
## 207 -0.278187191 -0.278187191
## 208 -0.280828739 -0.280828739
## 209 -0.281579714 -0.281579714
## 210 -0.286762155 -0.286762155
## 211 -0.293310947 -0.293310947
## 212 -0.299157985 -0.299157985
## 213 -0.301805804 -0.301805804
## 214 -0.308247530 -0.308247530
## 215 -0.319557728 -0.319557728
## 216 -0.322988141 -0.322988141
## 217 -0.326681800 -0.326681800
## 218 -0.332714522 -0.332714522
## 219 -0.338952224 -0.338952224
## 220 -0.339971598 -0.339971598
## 221 -0.352679019 -0.352679019
## 222 -0.353230342 -0.353230342
## 223 -0.369029760 -0.369029760
## 224 -0.371470412 -0.371470412
## 225 -0.373961330 -0.373961330
## 226 -0.382465257 -0.382465257
## 227 -0.386917627 -0.386917627
## 228 -0.399341350 -0.399341350
## 229 -0.402302463 -0.402302463
## 230 -0.408241965 -0.408241965
## 231 -0.409813988 -0.409813988
## 232 -0.414648762 -0.414648762
## 233 -0.424943962 -0.424943962
## 234 -0.433299324 -0.433299324
## 235 -0.453012221 -0.453012221
## 236 -0.460004138 -0.460004138
## 237 -0.463170582 -0.463170582
## 238 -0.481935034 -0.481935034
## 239 -0.493647685 -0.493647685
## 240 -0.516928175 -0.516928175
## 241 -0.526149835 -0.526149835
## 242 -0.532202186 -0.532202186
## 243 -0.535366964 -0.535366964
## 244 -0.558007364 -0.558007364
## 245 -0.562345376 -0.562345376
## 246 -0.562605701 -0.562605701
## 247 -0.574792306 -0.574792306
## 248 -0.581042054 -0.581042054
## 249 -0.583570986 -0.583570986
## 250 -0.586475888 -0.586475888
## 251 -0.590670069 -0.590670069
## 252 -0.592489679 -0.592489679
## 253 -0.594756659 -0.594756659
## 254 -0.605822004 -0.605822004
## 255 -0.610570811 -0.610570811
## 256 -0.627135299 -0.627135299
## 257 -0.640749006 -0.640749006
## 258 -0.651822912 -0.651822912
## 259 -0.660359145 -0.660359145
## 260 -0.678556354 -0.678556354
## 261 -0.693322771 -0.693322771
## 262 -0.702283567 -0.702283567
## 263 -0.764283534 -0.764283534
## 264 -0.783938282 -0.783938282
## 265 -0.785053492 -0.785053492
## 266 -0.794818056 -0.794818056
## 267 -0.822035882 -0.822035882
## 268 -0.822907697 -0.822907697
## 269 -0.833669030 -0.833669030
## 270 -0.852913709 -0.852913709
## 271 -0.865147989 -0.865147989
## 272 -0.867271942 -0.867271942
## 273 -0.880726097 -0.880726097
## 274 -0.906037727 -0.906037727
## 275 -0.911810922 -0.911810922
## 276 -0.913382748 -0.913382748
## 277 -0.920456895 -0.920456895
## 278 -0.932132012 -0.932132012
## 279 -0.943349589 -0.943349589
## 280 -0.947349785 -0.947349785
## 281 -0.999787513 -0.999787513
## 282 -1.019550666 -1.019550666
## 283 -1.030915523 -1.030915523
## 284 -1.043273548 -1.043273548
## 285 -1.045907732 -1.045907732
## 286 -1.069214248 -1.069214248
## 287 -1.086773569 -1.086773569
## 288 -1.089861228 -1.089861228
## 289 -1.090008421 -1.090008421
## 290 -1.094748841 -1.094748841
## 291 -1.126643500 -1.126643500
## 292 -1.139508688 -1.139508688
## 293 -1.164856016 -1.164856016
## 294 -1.190780496 -1.190780496
## 295 -1.198059681 -1.198059681
## 296 -1.225211590 -1.225211590
## 297 -1.243214639 -1.243214639
## 298 -1.277725269 -1.277725269
## 299 -1.278931210 -1.278931210
## 300 -1.341878084 -1.341878084
## 301 -1.351042340 -1.351042340
## 302 -1.354423486 -1.354423486
## 303 -1.374326828 -1.374326828
## 304 -1.395629053 -1.395629053
## 305 -1.449515709 -1.449515709
## 306 -1.508409723 -1.508409723
## 307 -1.542663329 -1.542663329
## 308 -1.621817420 -1.621817420
## 309 -1.640123195 -1.640123195
## 310 -1.655695609 -1.655695609
## 311 -1.973088543 -1.973088543
## 312 -2.000212934 -2.000212934
## 313 -2.151840452 -2.151840452
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_rf_model_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  print(auc_value) 
  modelTrain_rf_AUC <- auc_value
  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  print(auc_value) 
  modelTrain_rf_AUC <- auc_value
  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  print(auc_value) 
  modelTrain_rf_AUC <- auc_value
  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## Area under the curve: 0.69

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_rf_AUC <- mean_auc
}
print(modelTrain_rf_AUC)
## Area under the curve: 0.69

6. SVM

6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 454 samples
## 313 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 364, 363 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.8281563  0.6291507
##   0.50  0.8281319  0.6251914
##   1.00  0.8325763  0.6330576
## 
## Tuning parameter 'sigma' was held constant at a value of 0.001632383
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.001632383 and C = 1.
print(svm_model$bestTune)
##         sigma C
## 3 0.001632383 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8296215
modelTrain_mean_accuracy_cv_svm <- mean_accuracy_svm_model
print(modelTrain_mean_accuracy_cv_svm)
## [1] 0.8296215
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.991189427312775"
modelTrain_svm_trainAccuracy <-train_accuracy
print(modelTrain_svm_trainAccuracy)
## [1] 0.9911894
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_modelTrain_svm <- caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_modelTrain_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 113  11
##         CN  15  55
##                                           
##                Accuracy : 0.866           
##                  95% CI : (0.8098, 0.9105)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 5.65e-11        
##                                           
##                   Kappa : 0.7058          
##                                           
##  Mcnemar's Test P-Value : 0.5563          
##                                           
##             Sensitivity : 0.8828          
##             Specificity : 0.8333          
##          Pos Pred Value : 0.9113          
##          Neg Pred Value : 0.7857          
##              Prevalence : 0.6598          
##          Detection Rate : 0.5825          
##    Detection Prevalence : 0.6392          
##       Balanced Accuracy : 0.8581          
##                                           
##        'Positive' Class : CI              
## 
cm_modelTrain_svm_Accuracy <- cm_modelTrain_svm$overall["Accuracy"]
cm_modelTrain_svm_Kappa <- cm_modelTrain_svm$overall["Kappa"]
print(cm_modelTrain_svm_Accuracy)
##  Accuracy 
## 0.8659794
print(cm_modelTrain_svm_Kappa)
##     Kappa 
## 0.7057863

Let’s take a look of the feature importance of the model trained.

library(iml)

predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 648 rows and 314 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1 cg23432430      1.140000   1.166667      1.193333        0.05401235
## 2 cg12333628      1.080000   1.133333      1.133333        0.05246914
## 3 cg00962106      1.053333   1.133333      1.186667        0.05246914
## 4 cg03600007      1.040000   1.100000      1.100000        0.05092593
## 5 cg19799454      1.040000   1.100000      1.100000        0.05092593
## 6 cg15775217      1.013333   1.100000      1.126667        0.05092593
plot(importance_SVM)

library(vip)

vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  modelTrain_svm_AUC <- auc_value
  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4|| METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  modelTrain_svm_AUC <- auc_value
  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  modelTrain_svm_AUC <- auc_value
  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(test_data_SVM1$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (test_data_SVM1$DX CN) < 128 cases (test_data_SVM1$DX CI).
## Area under the curve: 0.9156
## [1] "The auc vlue is:"
## Area under the curve: 0.9156

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    modelTrain_svm_AUC <- mean_auc
}

7. Important Features

7.0 Choose Number of Top Features

# GOTO "INPUT" Session to set the Number of common features needed

NUM_COMMON_FEATURES <- NUM_COMMON_FEATURES_SET

7.1 Merge Important Features

The feature importance may not combined directly, since they are not all within the same measure, for example, the SVM model is use other method for feature importance.

So, let’s considering scale the feature to make them in the same range.

First, Let’s process with each data frame to ensure they have consistent format.

if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
# Process the dataframe to ensure they have consistent format.

# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"

head(importance_SVM_df_processed)

# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
importance_model_LRM1_df_processed$Feature<-rownames(importance_model_LRM1_df_processed)
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "Overall"] <- "Importance_LRM1"

head(importance_model_LRM1_df_processed)

# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed$Feature<-rownames(importance_elastic_net_model1_df_processed)
colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "Overall"] <- "Importance_ENM1"

head(importance_elastic_net_model1_df_processed)



# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)
colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"

head(importance_xgb_model_df_processed)


# RF

importance_rf_model_df_processed <- importance_rf_model_df

if (METHOD_FEATURE_FLAG_NUM == 3){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(CI, CN))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}

if (METHOD_FEATURE_FLAG_NUM == 4){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia, CN))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}



if (METHOD_FEATURE_FLAG_NUM == 5){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, CN))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}

if (METHOD_FEATURE_FLAG_NUM == 6){
  
  importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
  
  importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, Dementia))
  
  colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"


}


head(importance_rf_model_df_processed)


}

From above (binary case), we could ensure they have same data frame structure with same column names, ‘Importance’ and ‘feature’ in order.

If our case is the multiclass classification, see the below. Except XGBoost model and SVM model, the features importance of each model are computed by the max importance among the classes.

if(METHOD_FEATURE_FLAG == 1){
  
# Process the dataframe to ensure they have consistent format.

# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"

head(importance_SVM_df_processed)

# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "MaxImportance"] <- "Importance_LRM1"
importance_model_LRM1_df_processed <- subset(importance_model_LRM1_df_processed, select = -c(Dementia,MCI, CN))
head(importance_model_LRM1_df_processed)

# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed <- subset(importance_elastic_net_model1_df_processed, select = -c(Dementia,MCI, CN))

colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "MaxImportance"] <- "Importance_ENM1"

head(importance_elastic_net_model1_df_processed)



# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)

colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"


head(importance_xgb_model_df_processed)


# RF

importance_rf_model_df_processed <- importance_rf_model_df
  
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia,MCI, CN))
  
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "MaxImportance"] <- "Importance_RF"

head(importance_rf_model_df_processed)

}

Then, Let’s do scaling, here we choose min-max scaling.

importance_list <- list(logistic = importance_model_LRM1_df_processed, 
                        xgb = importance_xgb_model_df_processed, 
                        elastic_net = importance_elastic_net_model1_df_processed, 
                        rf = importance_rf_model_df_processed, 
                        svm = importance_SVM_df_processed)


min_max_scale_Imp<-function(df){
  x<-df[, grepl("Importance_", colnames(df))]
  df[, grepl("Importance_", colnames(df))] <- (x - min(x)) / (max(x) - min(x))
  return(df)
}

for (i in seq_along(importance_list)) {
    importance_list[[i]] <- min_max_scale_Imp(importance_list[[i]])
}


# Print each data frame after scaling
print(head(importance_list[[1]]))
##            Importance_LRM1    Feature
## age.now         0.00416345    age.now
## PC1             0.66841052        PC1
## PC2             0.43081520        PC2
## PC3             1.00000000        PC3
## cg18993517      0.09667700 cg18993517
## cg13573375      0.09253355 cg13573375
print(head(importance_list[[2]]))
##            Importance_XGB    Feature
## cg23432430      1.0000000 cg23432430
## age.now         0.6349744    age.now
## cg11438323      0.6199265 cg11438323
## cg11540596      0.5756862 cg11540596
## cg03660162      0.5201740 cg03660162
## cg17002719      0.4538267 cg17002719
print(head(importance_list[[3]]))
##            Importance_ENM1    Feature
## age.now        0.003392297    age.now
## PC1            0.759892549        PC1
## PC2            0.845266706        PC2
## PC3            1.000000000        PC3
## cg18993517     0.186716182 cg18993517
## cg13573375     0.150726625 cg13573375
print(head(importance_list[[4]]))
##            Importance_RF    Feature
## age.now        0.5289698    age.now
## PC1            0.3091080        PC1
## PC2            0.5468701        PC2
## PC3            0.5539209        PC3
## cg18993517     0.2797494 cg18993517
## cg13573375     0.2691145 cg13573375
print(head(importance_list[[5]]))
##   Importance_SVM    Feature
## 1      1.0000000 cg23432430
## 2      0.8333333 cg12333628
## 3      0.8333333 cg00962106
## 4      0.6666667 cg03600007
## 5      0.6666667 cg19799454
## 6      0.6666667 cg15775217

Now, Let’s merge the data frames of scaled feature importance.

# Merge all importances
combined_importance <- Reduce(function(x, y) merge(x, y, by = "Feature", all = TRUE), importance_list)

head(combined_importance)
# Replace NA with 0
combined_importance[is.na(combined_importance)] <- 0

# Exclude DX, as it's label

combined_importance <- combined_importance %>% 
  filter(Feature != "DX")

# View the filtered dataframe
head(combined_importance)

7.2 View the Important Features

7.2.1 Select Based on AVG

If select the TOP Number of important features based on average importance. (See the following)

combined_importance_AVF <- combined_importance
# Calculate average importance
combined_importance_AVF$Average_Importance <- rowMeans(combined_importance_AVF[,-1])

head(combined_importance_AVF)
combined_importance_Avg_ordered <- combined_importance_AVF[order(-combined_importance_AVF$Average_Importance),]

head(combined_importance_Avg_ordered)
# Top Number of common important features

print("the Top number of common features here is set to:")
## [1] "the Top number of common features here is set to:"
print(NUM_COMMON_FEATURES)
## [1] 20
top_Num_combined_importance_Avg_ordered <- head(combined_importance_Avg_ordered,n = NUM_COMMON_FEATURES)
print(top_Num_combined_importance_Avg_ordered)
##        Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 275 cg23432430       0.4919901      1.0000000       0.6227286     1.0000000      1.0000000          0.8229437
## 313        PC3       1.0000000      0.0000000       1.0000000     0.5539209      0.1666667          0.5441175
## 312        PC2       0.4308152      0.3804470       0.8452667     0.5468701      0.3333333          0.5073465
## 19  cg00962106       0.4116418      0.3243396       0.5202316     0.2711677      0.8333333          0.4721428
## 311        PC1       0.6684105      0.0322316       0.7598925     0.3091080      0.5000000          0.4539285
## 104 cg07158503       0.4041543      0.4231865       0.5113692     0.3100314      0.5000000          0.4297483
## 95  cg06697310       0.4024778      0.2062702       0.4996506     0.5434923      0.3333333          0.3970448
## 148 cg11331837       0.2739320      0.2538522       0.3343517     0.7012849      0.3333333          0.3793508
## 107 cg07634717       0.2111241      0.4114846       0.3415290     0.4133254      0.5000000          0.3754926
## 55  cg03660162       0.1781057      0.5201740       0.3579817     0.3014357      0.5000000          0.3715394
## 285 cg24851651       0.2441837      0.3664597       0.3139634     0.5957764      0.3333333          0.3707433
## 140 cg11019791       0.1828977      0.3709727       0.2905242     0.7446664      0.1666667          0.3511455
## 249 cg20685672       0.2441862      0.2450869       0.3304414     0.2541069      0.6666667          0.3480976
## 298 cg26081710       0.3235607      0.2004048       0.4029228     0.3109325      0.5000000          0.3475642
## 176 cg14168080       0.3058494      0.4150016       0.3628690     0.4784251      0.1666667          0.3457624
## 58  cg03749159       0.1366550      0.3271771       0.1890326     0.7288038      0.3333333          0.3430004
## 248 cg20678988       0.2435940      0.3749217       0.3480510     0.2445465      0.5000000          0.3422226
## 63  cg04156077       0.2679896      0.2788390       0.2897414     0.3698890      0.5000000          0.3412918
## 241 cg19503462       0.2158285      0.2777136       0.3310961     0.3808573      0.5000000          0.3410991
## 301 cg26853071       0.2101158      0.1135166       0.2653192     0.4412341      0.6666667          0.3393705
# Top Number of common important features' name

top_Num_combined_importance_Avg_ordered_Nam <- top_Num_combined_importance_Avg_ordered$Feature

print(top_Num_combined_importance_Avg_ordered_Nam)
##  [1] "cg23432430" "PC3"        "PC2"        "cg00962106" "PC1"        "cg07158503" "cg06697310" "cg11331837" "cg07634717" "cg03660162" "cg24851651" "cg11019791" "cg20685672" "cg26081710" "cg14168080"
## [16] "cg03749159" "cg20678988" "cg04156077" "cg19503462" "cg26853071"

Visualization with bar plot for the feature average importance

ggplot(combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
  geom_bar(stat = "identity") +
  coord_flip() +  # Flip coordinates to make it horizontal
  labs(title = "Feature Importance Sorted by Average Value",
       x = "Feature",
       y = "Average Importance") +
  theme_minimal()

Visualization with bar plot for the top feature average importance

ggplot(top_Num_combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
  geom_bar(stat = "identity") +
  coord_flip() + 
  labs(title = paste("Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Average Value"),
       x = "Feature",
       y = "Average Importance") +
  theme_minimal()

7.2.2 Select Based on Quantile

The following will show, If we select the TOP Number of important features based on specific quantile importance. ( Here we choose to use median i.e 50% quantile)

Let’s create the new data frame with different quantiles of feature importance for each models.

And order by the 50% quantile from high to low, select top features based on that.

quantiles <- t(apply(combined_importance[,-1], 1, function(x) quantile(x, probs = c(0,0.25, 0.5, 0.75,1))))

combined_importance_quantiles <- cbind(Feature = combined_importance$Feature, quantiles)

combined_importance_quantiles <- as.data.frame(combined_importance_quantiles)
combined_importance_quantiles$`50%` <- as.numeric(combined_importance_quantiles$`50%`)
combined_importance_quantiles$`0%` <- as.numeric(combined_importance_quantiles$`0%`)

combined_importance_quantiles$`25%` <- as.numeric(combined_importance_quantiles$`25%`)

combined_importance_quantiles$`75%` <- as.numeric(combined_importance_quantiles$`75%`)

combined_importance_quantiles$`100%` <- as.numeric(combined_importance_quantiles$`100%`)

# Sort by median importance (50th percentile)
combined_importance_quantiles <- combined_importance_quantiles[order(-combined_importance_quantiles$`50%`), ]


head(combined_importance_quantiles)
top_Num_median_features_imp <- head(combined_importance_quantiles,n = NUM_COMMON_FEATURES)
print(top_Num_median_features_imp)
##        Feature          0%        25%       50%       75%      100%
## 275 cg23432430 0.491990101 0.62272860 1.0000000 1.0000000 1.0000000
## 313        PC3 0.000000000 0.16666667 0.5539209 1.0000000 1.0000000
## 1      age.now 0.003392297 0.00416345 0.5000000 0.5289698 0.6349744
## 311        PC1 0.032231598 0.30910796 0.5000000 0.6684105 0.7598925
## 312        PC2 0.333333333 0.38044700 0.4308152 0.5468701 0.8452667
## 104 cg07158503 0.310031430 0.40415432 0.4231865 0.5000000 0.5113692
## 19  cg00962106 0.271167721 0.32433959 0.4116418 0.5202316 0.8333333
## 107 cg07634717 0.211124130 0.34152897 0.4114846 0.4133254 0.5000000
## 95  cg06697310 0.206270193 0.33333333 0.4024778 0.4996506 0.5434923
## 176 cg14168080 0.166666667 0.30584936 0.3628690 0.4150016 0.4784251
## 55  cg03660162 0.178105731 0.30143569 0.3579817 0.5000000 0.5201740
## 33  cg02225060 0.014823116 0.16666667 0.3549348 0.4784559 0.5342408
## 106 cg07504457 0.166666667 0.27280725 0.3484476 0.3579963 0.4223343
## 134 cg10701746 0.036496597 0.33973960 0.3483977 0.4248608 0.5000000
## 248 cg20678988 0.243593957 0.24454648 0.3480510 0.3749217 0.5000000
## 121 cg09015880 0.015224654 0.33333333 0.3367794 0.3548429 0.3801625
## 242 cg19799454 0.004133443 0.10685290 0.3344244 0.4012688 0.6666667
## 2   cg00004073 0.252321139 0.26934952 0.3333333 0.3687430 0.3932005
## 6   cg00154902 0.205862688 0.22374844 0.3333333 0.3571489 0.4724526
## 46  cg02887598 0.068421117 0.29894356 0.3333333 0.3737771 0.4329452
top_Num_median_features_Name<-top_Num_median_features_imp$Feature
print(top_Num_median_features_Name)
##  [1] "cg23432430" "PC3"        "age.now"    "PC1"        "PC2"        "cg07158503" "cg00962106" "cg07634717" "cg06697310" "cg14168080" "cg03660162" "cg02225060" "cg07504457" "cg10701746" "cg20678988"
## [16] "cg09015880" "cg19799454" "cg00004073" "cg00154902" "cg02887598"

Visualization with the box plot.

library(tidyr)

long_df <- pivot_longer(combined_importance_quantiles, 
                        cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
                        names_to = "Quantile",
                        values_to = "Importance")

ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
  geom_boxplot() +
  coord_flip() +  
  labs(title = "Distribution of Feature Importances",
       x = "Feature",
       y = "Importance") +
  theme_minimal()


Visualization with top features with box plot.

library(tidyr)

long_df <- pivot_longer(top_Num_median_features_imp, 
                        cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
                        names_to = "Quantile",
                        values_to = "Importance")

ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
  geom_boxplot() +
  coord_flip() +
  labs(
    title = paste("Distribution of Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Median Value"),
       x = "Feature",
       y = "Importance") +
  theme_minimal()

7.2.3 Select Based on Frequency/Common

The frequency / common feature importance is processed in the following:

  1. Select the TOP Number of features (say 40) for each model (This number is set to “NUM_COMMON_FEATURES_SET_Frequency” in the INPUT session )
  2. Calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
  3. For each features that appear greater or equal than half time, we consider it’s important and collect these important features as common features.
n_select_frequencyWay <- NUM_COMMON_FEATURES_SET_Frequency
combined_importance_freq_ordered_df<-combined_importance_Avg_ordered
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature

# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature


# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature


# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature


# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature
# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))

models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models), 
                         dimnames = list(all_features, models))

# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
  feature_matrix[feature, "LRM"] <- 
    as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
  feature_matrix[feature, "XGB"] <- 
    as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
  feature_matrix[feature, "ENM"] <- 
    as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
  feature_matrix[feature, "RF"] <- 
    as.integer(feature %in% top_impAvg_orderby_RF_NAME)
  feature_matrix[feature, "SVM"] <- 
    as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}

feature_df <- as.data.frame(feature_matrix)

print(head(feature_df))
##            LRM XGB ENM RF SVM
## PC3          1   0   1  1   0
## PC1          1   0   1  0   1
## cg23432430   1   1   1  1   1
## cg09727210   1   0   1  0   0
## PC2          1   1   1  1   0
## cg00962106   1   0   1  0   1

For quickly read, we calculate the time that the feature have been appeared, by calculated row sum and add the row sum column into our data frame.

feature_df$Total_Count <- rowSums(feature_df[,1:5])
feature_df <- feature_df[order(-feature_df$Total_Count), ]
frequency_feature_df_RAW_ordered<-feature_df
print(feature_df)
##            LRM XGB ENM RF SVM Total_Count
## cg23432430   1   1   1  1   1           5
## PC2          1   1   1  1   0           4
## cg07158503   1   1   1  0   1           4
## PC3          1   0   1  1   0           3
## PC1          1   0   1  0   1           3
## cg00962106   1   0   1  0   1           3
## cg06697310   1   0   1  1   0           3
## cg26081710   1   0   1  0   1           3
## cg09727210   1   0   1  0   0           2
## cg02225060   1   0   1  0   0           2
## cg09015880   1   0   1  0   0           2
## cg16338321   1   0   1  0   0           2
## cg00819121   1   0   1  0   0           2
## cg00415024   1   0   0  1   0           2
## cg21757617   1   0   1  0   0           2
## cg14168080   1   1   0  0   0           2
## cg02887598   1   0   1  0   0           2
## cg05064044   1   0   1  0   0           2
## cg03660162   0   1   0  0   1           2
## cg07634717   0   1   0  0   1           2
## cg19799454   0   1   0  0   1           2
## cg20678988   0   1   0  0   1           2
## cg11019791   0   1   0  1   0           2
## cg10701746   1   0   0  0   0           1
## cg01910713   1   0   0  0   0           1
## age.now      0   1   0  0   0           1
## cg11438323   0   1   0  0   0           1
## cg11540596   0   1   0  0   0           1
## cg17002719   0   1   0  0   0           1
## cg09120722   0   1   0  0   0           1
## cg17002338   0   1   0  0   0           1
## cg11227702   0   1   0  0   0           1
## cg18816397   0   1   0  0   0           1
## cg02122327   0   1   0  0   0           1
## cg13573375   0   1   0  0   0           1
## cg03088219   0   1   0  0   0           1
## cg06277607   0   0   1  0   0           1
## cg27272246   0   0   1  0   0           1
## cg00004073   0   0   1  0   0           1
## cg17429539   0   0   1  0   0           1
## cg03749159   0   0   0  1   0           1
## cg11331837   0   0   0  1   0           1
## cg21697769   0   0   0  1   0           1
## cg01008088   0   0   0  1   0           1
## cg04768387   0   0   0  1   0           1
## cg16431720   0   0   0  1   0           1
## cg12784167   0   0   0  1   0           1
## cg23159970   0   0   0  1   0           1
## cg24851651   0   0   0  1   0           1
## cg17042243   0   0   0  1   0           1
## cg09451339   0   0   0  1   0           1
## cg17386240   0   0   0  1   0           1
## cg04109990   0   0   0  1   0           1
## cg14192979   0   0   0  1   0           1
## cg12333628   0   0   0  0   1           1
## cg20685672   0   0   0  0   1           1
## cg26853071   0   0   0  0   1           1
## cg24883219   0   0   0  0   1           1
## cg06833284   0   0   0  0   1           1
## cg03600007   0   0   0  0   1           1
## cg01280698   0   0   0  0   1           1
## cg13226272   0   0   0  0   1           1
## cg15775217   0   0   0  0   1           1
## cg04156077   0   0   0  0   1           1
## cg19503462   0   0   0  0   1           1

Combine with the importance data frame

all_features <- union(combined_importance_freq_ordered_df$Feature, rownames(feature_df))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_df_full <- data.frame(Feature = all_features)
feature_df_full <- merge(feature_df_full, feature_df, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_df_full[is.na(feature_df_full)] <- 0


# For top_impAvg_ordered
all_impAvg_ordered_full <- data.frame(Feature = all_features)
all_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,all_impAvg_ordered_full, by.x = "Feature", by.y = "Feature", all.x = TRUE)
all_impAvg_ordered_full[is.na(all_impAvg_ordered_full)] <- 0
all_combined_df_impAvg <- merge(feature_df_full, all_impAvg_ordered_full, by = "Feature", all = TRUE)

print(head(feature_df_full))
##      Feature LRM XGB ENM RF SVM Total_Count
## 1    age.now   0   1   0  0   0           1
## 2 cg00004073   0   0   1  0   0           1
## 3 cg00084271   0   0   0  0   0           0
## 4 cg00086247   0   0   0  0   0           0
## 5 cg00146240   0   0   0  0   0           0
## 6 cg00154902   0   0   0  0   0           0
print(head(all_impAvg_ordered_full))
##      Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1    age.now      0.00416345     0.63497444     0.003392297     0.5289698      0.5000000          0.3343000
## 2 cg00004073      0.26934952     0.25232114     0.368743031     0.3932005      0.3333333          0.3233895
## 3 cg00084271      0.22358222     0.08451443     0.272790932     0.5066986      0.1666667          0.2508506
## 4 cg00086247      0.00000000     0.15625070     0.068094153     0.2757605      0.0000000          0.1000211
## 5 cg00146240      0.08729337     0.00000000     0.195466203     0.5233594      0.1666667          0.1945571
## 6 cg00154902      0.20586269     0.35714894     0.223748437     0.4724526      0.3333333          0.3185092
print(head(all_combined_df_impAvg))
##      Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1    age.now   0   1   0  0   0           1      0.00416345     0.63497444     0.003392297     0.5289698      0.5000000          0.3343000
## 2 cg00004073   0   0   1  0   0           1      0.26934952     0.25232114     0.368743031     0.3932005      0.3333333          0.3233895
## 3 cg00084271   0   0   0  0   0           0      0.22358222     0.08451443     0.272790932     0.5066986      0.1666667          0.2508506
## 4 cg00086247   0   0   0  0   0           0      0.00000000     0.15625070     0.068094153     0.2757605      0.0000000          0.1000211
## 5 cg00146240   0   0   0  0   0           0      0.08729337     0.00000000     0.195466203     0.5233594      0.1666667          0.1945571
## 6 cg00154902   0   0   0  0   0           0      0.20586269     0.35714894     0.223748437     0.4724526      0.3333333          0.3185092

Frequency Feature Selection

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.

if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data[,c("DX",df_process_mutual_FeatureName)]

print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
## [1] "The number of final used features of common importance method: 8"
if(METHOD_FEATURE_FLAG == 1){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data_m1[,c("DX",df_process_mutual_FeatureName)]

print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
print(df_process_mutual_FeatureName)
## [1] "cg23432430" "PC2"        "cg07158503" "PC3"        "PC1"        "cg00962106" "cg06697310" "cg26081710"

Importance of these features:

Top_Frequency_Feature_importance <- combined_importance_freq_ordered_df[
    combined_importance_freq_ordered_df$Feature %in% df_process_mutual_FeatureName,
]

print(Top_Frequency_Feature_importance)
##        Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 275 cg23432430       0.4919901      1.0000000       0.6227286     1.0000000      1.0000000          0.8229437
## 313        PC3       1.0000000      0.0000000       1.0000000     0.5539209      0.1666667          0.5441175
## 312        PC2       0.4308152      0.3804470       0.8452667     0.5468701      0.3333333          0.5073465
## 19  cg00962106       0.4116418      0.3243396       0.5202316     0.2711677      0.8333333          0.4721428
## 311        PC1       0.6684105      0.0322316       0.7598925     0.3091080      0.5000000          0.4539285
## 104 cg07158503       0.4041543      0.4231865       0.5113692     0.3100314      0.5000000          0.4297483
## 95  cg06697310       0.4024778      0.2062702       0.4996506     0.5434923      0.3333333          0.3970448
## 298 cg26081710       0.3235607      0.2004048       0.4029228     0.3109325      0.5000000          0.3475642
ggplot(Top_Frequency_Feature_importance, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
  geom_bar(stat = "identity") +
  coord_flip() + 
  labs(title = "Feature Importance Selected Based on Frequncy Way and Sorted by Average Value",
       x = "Feature",
       y = "Average Importance") +
  theme_minimal()

Important feature based on frequency but not in Average

# This is to check if all elements inside Mutual method is in Mean method, and print out the features that not in Mean method

all(df_process_mutual_FeatureName %in% top_Num_combined_importance_Avg_ordered_Nam)
## [1] TRUE
Mutual_not_in_Mean <- setdiff(df_process_mutual_FeatureName, top_Num_combined_importance_Avg_ordered_Nam)
print(Mutual_not_in_Mean)
## character(0)

SAVE AS RDATA -MAY NOT NEEDED

Overview of the Data Frame Variables.

Phenotype Part Data frame : “phenoticPart_RAW”

RAW Merged Data frame : “merged_df_raw”

Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles”

Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered”

Ordered Feature Frequency / Common Data Frame:

  • “frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency.

  • “feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.

  • “all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.

head(phenoticPart_RAW)
# 
# save(NUM_COMMON_FEATURES,
#      combined_importance_quantiles,
#      combined_importance_Avg_ordered,
#      frequency_feature_df_RAW_ordered,
#      top_Num_median_features_Name,
#      top_Num_combined_importance_Avg_ordered_Nam,
#      file = "Part2_V8_08_top_features_5KCpGs.RData")
# 
# save(processed_data_m3,processed_data_m3_df,AfterProcess_FeatureName_m3,file = "Part2_V8_08_BinaryMerged_5KCpGs.RData")
# 
# save(phenoticPart_RAW, merged_df_raw, file = "PhenotypeAndMerged.RData")

8. Feature Selection and Output

8.0 Input - Number of Top Features and Method Choose.

The feature selection method :

  1. based on mean feature importance ( set “INPUT_Method_Mean_Choose = TRUE” )
  2. based on median quantile feature importance ( set “INPUT_Method_Median_Choose = TRUE” )
  3. based on feature frequency importance. ( set “INPUT_Method_Frequency_Choose = TRUE” )
    • Comment: If use the feature frequency importance method, The Input number of features = N is used for the first step, select TOP N features for each model. In the end, may not exactly same as N features kept.
  4. Set Input method flag to FALSE will not generate the data based that method. If we want output all data based on each method, set all flag to TRUE. In summary, set the corresponding flag to TRUE, we will output the data set selected based on that corresponding method.
Number_fea_input <- INPUT_NUMBER_FEATURES

Flag_8mean <- INPUT_Method_Mean_Choose 
Flag_8median <- INPUT_Method_Median_Choose 
Flag_8Fequency <- INPUT_Method_Frequency_Choose 
print(paste("the Top number of features here is set to:", Number_fea_input))
## [1] "the Top number of features here is set to: 250"
Flag_8mean
## [1] TRUE
Flag_8median
## [1] TRUE
Flag_8Fequency
## [1] TRUE

8.1 Selected For Output

Based on Mean

selected_impAvg_ordered <- head(combined_importance_Avg_ordered,n = Number_fea_input)
print(head(selected_impAvg_ordered))
##        Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 275 cg23432430       0.4919901      1.0000000       0.6227286     1.0000000      1.0000000          0.8229437
## 313        PC3       1.0000000      0.0000000       1.0000000     0.5539209      0.1666667          0.5441175
## 312        PC2       0.4308152      0.3804470       0.8452667     0.5468701      0.3333333          0.5073465
## 19  cg00962106       0.4116418      0.3243396       0.5202316     0.2711677      0.8333333          0.4721428
## 311        PC1       0.6684105      0.0322316       0.7598925     0.3091080      0.5000000          0.4539285
## 104 cg07158503       0.4041543      0.4231865       0.5113692     0.3100314      0.5000000          0.4297483
print(dim(selected_impAvg_ordered))
## [1] 250   7
selected_impAvg_ordered_NAME <- selected_impAvg_ordered$Feature

print(head(selected_impAvg_ordered_NAME))
## [1] "cg23432430" "PC3"        "PC2"        "cg00962106" "PC1"        "cg07158503"
df_selected_Mean <- processed_dataFrame[,c("DX",selected_impAvg_ordered_NAME)]
print(head(df_selected_Mean))
##                     DX cg23432430          PC3        PC2 cg00962106          PC1 cg07158503 cg06697310 cg11331837 cg07634717 cg03660162 cg24851651 cg11019791 cg20685672 cg26081710 cg14168080
## 200223270003_R02C01 CI  0.9482702 -0.014043316 0.01470293  0.9124898 -0.214185447  0.5777146  0.8454609 0.03692842  0.7483382  0.8691767 0.03674702  0.8112324  0.6712101  0.8751040  0.4190123
## 200223270003_R03C01 CN  0.9455418  0.005055871 0.05745834  0.5375751 -0.172761185  0.6203543  0.8653044 0.57150125  0.8254434  0.5160770 0.05358297  0.7831231  0.7932091  0.9198212  0.4420256
## 200223270003_R06C01 CN  0.9418716  0.029143653 0.08372861  0.5040948 -0.003667305  0.6236025  0.2405168 0.03182862  0.8181246  0.9026304 0.05968923  0.4353250  0.6613646  0.8801892  0.4355521
##                     cg03749159 cg20678988 cg04156077 cg19503462 cg26853071 age.now cg11540596 cg00415024 cg10701746 cg00004073 cg11227702 cg19471911 cg09727210 cg00154902 cg17002719 cg07504457
## 200223270003_R02C01  0.9355921  0.8438718  0.7321883  0.7951675  0.4233820    82.4  0.9238951  0.4299553  0.4795503 0.02928535 0.86486075  0.6334393  0.4240111  0.5137741 0.04939181  0.7116230
## 200223270003_R03C01  0.9153921  0.8548886  0.6865805  0.4537684  0.7451354    78.6  0.8926595  0.3999122  0.4868342 0.02787198 0.49184121  0.8437175  0.8812928  0.8540746 0.40466475  0.6854539
## 200223270003_R06C01  0.9255807  0.7786685  0.8501188  0.6997359  0.4228079    80.4  0.8820252  0.7465084  0.4927257 0.64576463 0.02543724  0.6127952  0.8493743  0.8188126 0.51428089  0.7205633
##                     cg25879395 cg01008088 cg02225060 cg12543766 cg09120722 cg11787167 cg19799454 cg02887598 cg01128042 cg21697769 cg25208881 cg16779438 cg17386240  cg03088219 cg24883219 cg15535896
## 200223270003_R02C01 0.88130864  0.8424817  0.6828159 0.51028134  0.5878977 0.03853894  0.9178930 0.04020908  0.9113420  0.8946108  0.1851956  0.8826150  0.7473400 0.844002862  0.6430473  0.3382952
## 200223270003_R03C01 0.02603438  0.2417656  0.8265195 0.88741539  0.8287506 0.04673831  0.9106247 0.67073881  0.5328806  0.2822953  0.9092286  0.5466924  0.7144809 0.007435243  0.6822115  0.9253926
## 200223270003_R06C01 0.91060615  0.2618620  0.5209552 0.02818501  0.8793344 0.32564508  0.9066551 0.73408417  0.5222757  0.8698740  0.9265502  0.8629492  0.8074824 0.120155222  0.5296903  0.3320191
##                     cg16338321 cg21757617 cg18285382 cg17429539 cg10738648 cg02078724 cg09015880 cg20823859 cg18816397 cg16431720 cg06833284 cg23517115 cg11438323 cg02932958 cg08096656 cg05064044
## 200223270003_R02C01  0.5350242 0.03652647  0.3202927  0.7860900 0.44931577  0.3096774  0.5101716  0.9030711  0.5472925  0.7356099  0.9125144  0.2151144  0.4863471  0.7901008  0.9362594  0.5672851
## 200223270003_R03C01  0.8294062 0.44299089  0.2930577  0.7100923 0.49894016  0.2896133  0.8402106  0.6062985  0.4940355  0.8692449  0.9003482  0.9131440  0.8984559  0.4210489  0.9314878  0.5358875
## 200223270003_R06C01  0.4918708 0.44725379  0.8923595  0.7660838 0.05552024  0.2805612  0.8472063  0.8917348  0.5337018  0.8773137  0.6097933  0.8328364  0.8722772  0.3825995  0.4943033  0.5273964
##                     cg05234269 cg25169289 cg14710850 cg26679884 cg03600007 cg15098922 cg01921484 cg16715186 cg06961873 cg12240569 cg01910713 cg25712921 cg00648024 cg03982462 cg08745107 cg26983017
## 200223270003_R02C01 0.93848584  0.1100884  0.8048592  0.6793815  0.5658487  0.9286092  0.9098550  0.2742789  0.5335591 0.82772064  0.8573169  0.2829848 0.51410972  0.8562777 0.02921338 0.89868232
## 200223270003_R03C01 0.57461229  0.7667174  0.8090950  0.1848705  0.6018832  0.9027517  0.9093137  0.7946153  0.5472606 0.02690547  0.8538850  0.6220919 0.40202875  0.6023731 0.78542320 0.03145466
## 200223270003_R06C01 0.02467208  0.2264993  0.8285902  0.1701734  0.8611166  0.8525611  0.9204487  0.8124316  0.9415177 0.46030640  0.8110366  0.6384003 0.05579011  0.8778458 0.02709928 0.84677625
##                     cg00084271 cg16858433 cg06371647 cg26846609 cg15184869 cg13573375 cg04831745 cg22931151 cg18918831 cg07640670 cg15600437 cg01280698 cg12689021 cg27577781 cg13405878 cg22666875
## 200223270003_R02C01  0.8103611  0.9184356  0.8336894 0.48860949  0.8622328  0.8670419 0.61984995  0.9311023  0.4891660 0.58296513  0.4885353  0.8985067  0.7706828  0.8143535  0.4549662  0.8177182
## 200223270003_R03C01  0.7877006  0.9194211  0.8198684 0.04878986  0.8996252  0.1733934 0.71214149  0.9356702  0.5333801 0.55225610  0.4894487  0.8846201  0.7449475  0.8113185  0.7858042  0.8291957
## 200223270003_R06C01  0.7706165  0.9271632  0.8069537 0.48026945  0.8688117  0.8888246 0.06871768  0.9328614  0.6406575 0.04058533  0.8551374  0.8847132  0.7872237  0.8144274  0.7583938  0.3694180
##                     cg16536985 cg16202259 cg18857647 cg22305850 cg27224751 cg09247979 cg12333628 cg16571124 cg03979311 cg12421087 cg15700429 cg13739190 cg00819121 cg25436480 cg04768387 cg24634455
## 200223270003_R02C01  0.5789643  0.9548726  0.8582332 0.03361934 0.44503947  0.5070956  0.9227884  0.9282854 0.86644909  0.5647607  0.7879010  0.8510103  0.9207001  0.8425160  0.3131047  0.7796391
## 200223270003_R03C01  0.5418687  0.3713483  0.8394132 0.57522232 0.03214912  0.5706177  0.9092861  0.9206431 0.06199853  0.5399655  0.9114530  0.8358482  0.9281472  0.4994032  0.9465814  0.5188241
## 200223270003_R06C01  0.8392044  0.4852461  0.2647491 0.58548744 0.83123722  0.5090215  0.5084647  0.9276842 0.72615553  0.5400348  0.8838233  0.8419471  0.9327211  0.3494312  0.9098563  0.5325725
##                     cg11133939 cg17042243 cg22542451 cg01608425 cg06864789 cg06880438 cg13387643 cg12702014 cg03737947 cg02823329 cg00696044 cg06960717 cg20673830 cg25649515 cg10681981 cg15633912
## 200223270003_R02C01  0.1282694  0.2502905  0.5884356  0.9030410 0.05369415  0.8285145  0.4229959  0.7704049 0.91824910  0.9462397 0.55608424  0.7030978  0.2422052  0.9279829  0.7035090  0.1605530
## 200223270003_R03C01  0.5920898  0.2933475  0.8337068  0.9264388 0.46053125  0.7988881  0.4200273  0.7848681 0.92067153  0.6464005 0.07552381  0.7653402  0.6881735  0.9235753  0.7382662  0.9333421
## 200223270003_R06C01  0.5127706  0.2725457  0.8125084  0.8887753 0.87513655  0.7839538  0.4161488  0.8065993 0.03638091  0.9633930 0.79270858  0.7206218  0.2134634  0.5895839  0.6971989  0.8737362
##                     cg02668233 cg27272246 cg18150287 cg18339359 cg04718469 cg01933473 cg02122327 cg18993517 cg02495179 cg02356645 cg09216282 cg09584650 cg00512739 cg23352245 cg12776173 cg19301366
## 200223270003_R02C01  0.4708431  0.8615873  0.7685695  0.8824858  0.8687522  0.2589014 0.38940091  0.2091538  0.6813307  0.5105903  0.9349248 0.08230254  0.9337648  0.9377232  0.1038804  0.8831393
## 200223270003_R03C01  0.8841930  0.8705287  0.7519166  0.9040272  0.7256813  0.6726133 0.37769608  0.2665896  0.7373055  0.5833923  0.9244259 0.09661586  0.8863895  0.9375774  0.8730635  0.8072679
## 200223270003_R06C01  0.4575646  0.8103777  0.2501173  0.8552121  0.8521881  0.2642560 0.04017909  0.2574003  0.5588114  0.5701428  0.9263996 0.52399749  0.9242748  0.5932742  0.7009491  0.8796022
##                     cg25758034 cg04316537 cg14687298 cg13226272 cg13372276 cg12556569 cg06277607 cg17002338 cg24307368 cg14627380 cg10091792 cg08584917 cg18819889 cg24697433 cg03084184 cg23159970
## 200223270003_R02C01  0.6114028  0.8074830 0.04206702 0.02637249 0.04888111 0.06218231 0.10744587  0.9286251 0.64323677  0.9455369  0.8670733  0.5663205  0.9156157  0.9243095  0.8162981 0.61817246
## 200223270003_R03C01  0.6649219  0.8453340 0.14813581 0.54100016 0.62396373 0.03924599 0.09353494  0.2684163 0.34980461  0.9258964  0.5864221  0.9019732  0.9004455  0.6808390  0.7877128 0.57492600
## 200223270003_R06C01  0.2393844  0.4351695 0.24260002 0.44370701 0.59693465 0.48636893 0.09504696  0.2811103 0.02720398  0.5789898  0.6087997  0.9187789  0.9054439  0.6384606  0.4546397 0.03288909
##                     cg22112152 cg12784167 cg08198851 cg17129965 cg00939409 cg08788093 cg09451339 cg20078646 cg10788927 cg16089727 cg00146240 cg15775217 cg18526121 cg01662749 cg14192979 cg03672288
## 200223270003_R02C01  0.8476101 0.81503498  0.6578905  0.8972140  0.2652180 0.03911678  0.2243746 0.06198170  0.8973154 0.86748697  0.6336151  0.5707441  0.4519781  0.3506201 0.06336040  0.9235592
## 200223270003_R03C01  0.8014136 0.02811410  0.6578186  0.8806673  0.8882671 0.60934160  0.2340702 0.89537412  0.2021398 0.54996692  0.8957183  0.9168327  0.4762313  0.2510946 0.06019651  0.6718625
## 200223270003_R06C01  0.7897897 0.03073269  0.1272153  0.8857237  0.8842646 0.88380243  0.8921284 0.08725521  0.2053075 0.05876736  0.1433218  0.6042521  0.4833367  0.8061480 0.52114282  0.9007629
##                     cg25306893 cg05392160 cg05321907 cg25277809 cg05876883 cg06715136 cg06483046 cg14307563 cg14170504 cg04497611 cg24139837 cg05161773 cg05593887 cg11286989 cg10240127 cg27160885
## 200223270003_R02C01  0.6265392  0.9328933  0.2880477  0.1632342  0.9039064  0.3400192 0.04383925  0.1855966 0.54915621  0.9086359 0.07404605  0.4120912  0.5939220  0.7590008  0.9250553  0.2231606
## 200223270003_R03C01  0.8330282  0.2576881  0.1782629  0.4913711  0.9223308  0.9259109 0.50720277  0.8916957 0.02236650  0.8818513 0.04183445  0.4154907  0.5766550  0.8533989  0.9403255  0.8263885
## 200223270003_R06C01  0.6175380  0.8920726  0.8427929  0.5952124  0.4697980  0.9079807 0.89604910  0.8750052 0.02988245  0.5853116 0.05657120  0.8526849  0.9148338  0.7313884  0.9056974  0.2121179
##                     cg01549082 cg04412904 cg14532717 cg06118351 cg22535849 cg11706829 cg00322003 cg08554146 cg02627240 cg18029737 cg17723206 cg03549208 cg21986118 cg05850457 cg09785377 cg14293999
## 200223270003_R02C01  0.2924138 0.05088595  0.5732280  0.3633940  0.8847704  0.8897234  0.1759911  0.8982080 0.66706843  0.9100454 0.92881042  0.9014487  0.6658175  0.8183013  0.9162088  0.2836710
## 200223270003_R03C01  0.7065693 0.07717659  0.1107638  0.4714860  0.8609966  0.5444785  0.5702070  0.8963074 0.57129408  0.9016634 0.48556255  0.8381784  0.6571296  0.8313023  0.9226292  0.9172023
## 200223270003_R06C01  0.2895440 0.08253743  0.6273416  0.8655962  0.8808022  0.5669449  0.3077122  0.8213878 0.05309659  0.7376586 0.01765023  0.9097817  0.7034445  0.8161364  0.6405193  0.9168166
##                     cg07138269 cg15985500 cg14780448 cg04124201 cg17738613 cg17906851 cg22169467 cg22071943 cg20981163 cg10039445 cg02246922 cg08896901 cg02631626 cg11247378 cg08857872 cg00295418
## 200223270003_R02C01  0.5002290  0.8555262  0.9119141  0.8686421  0.6879612  0.9488392  0.3095010  0.8705217  0.8990628  0.8833873  0.7301201  0.3581911  0.6280766  0.1591185  0.3395280 0.44954665
## 200223270003_R03C01  0.9426707  0.8312198  0.6702102  0.3308589  0.6582258  0.9529718  0.2978585  0.2442648  0.9264076  0.8954055  0.9447019  0.2467071  0.1951736  0.7874849  0.8181845 0.48471295
## 200223270003_R06C01  0.5057781  0.8492103  0.6207355  0.3241613  0.1022257  0.6462151  0.8955853  0.2644581  0.4874651  0.8832807  0.7202230  0.9225209  0.2699849  0.4807942  0.2970779 0.02004532
##                     cg14507637 cg18949721 cg11187460 cg12146221 cg08041188 cg04867412 cg00345083 cg11268585 cg21388339 cg12228670 cg23916408 cg26901661 cg21243064 cg06403901 cg15730644 cg00322820
## 200223270003_R02C01  0.9051258  0.2334245 0.03672179  0.2049284  0.7752456 0.04304823 0.47960968  0.2521544  0.2756268  0.8632174  0.1942275  0.8951971  0.5191606 0.92790690  0.4803181  0.4869764
## 200223270003_R03C01  0.9009460  0.2437792 0.92516409  0.1814927  0.3201255 0.87967997 0.50833875  0.8535791  0.2102269  0.8496212  0.9154993  0.8754981  0.9167649 0.04783341  0.4353906  0.4858988
## 200223270003_R06C01  0.9013686  0.2523095 0.03109553  0.8619250  0.7900939 0.44971146 0.03929249  0.9121931  0.7649181  0.8738949  0.8886255  0.9021064  0.4862205 0.05253626  0.8763048  0.4754313
##                     cg04645024 cg24643105 cg03221390 cg21139150 cg17131279 cg15501526 cg13653328 cg24470466 cg23836570 cg13038195 cg04664583
## 200223270003_R02C01  0.7366541  0.5303418  0.5859063 0.01853264  0.1900637  0.6362531  0.9245434  0.7725300 0.58688450 0.45882213  0.5572814
## 200223270003_R03C01  0.8454827  0.5042688  0.9180706 0.43223243  0.7048637  0.6319253  0.5122938  0.9041432 0.54259383 0.02740132  0.5881190
## 200223270003_R06C01  0.0871902  0.9383050  0.6399867 0.43772680  0.1492861  0.7435100  0.9362798  0.1206738 0.03267304 0.46284376  0.9352717
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
dim(df_selected_Mean)
## [1] 648 251
print(selected_impAvg_ordered_NAME)
##   [1] "cg23432430" "PC3"        "PC2"        "cg00962106" "PC1"        "cg07158503" "cg06697310" "cg11331837" "cg07634717" "cg03660162" "cg24851651" "cg11019791" "cg20685672" "cg26081710" "cg14168080"
##  [16] "cg03749159" "cg20678988" "cg04156077" "cg19503462" "cg26853071" "age.now"    "cg11540596" "cg00415024" "cg10701746" "cg00004073" "cg11227702" "cg19471911" "cg09727210" "cg00154902" "cg17002719"
##  [31] "cg07504457" "cg25879395" "cg01008088" "cg02225060" "cg12543766" "cg09120722" "cg11787167" "cg19799454" "cg02887598" "cg01128042" "cg21697769" "cg25208881" "cg16779438" "cg17386240" "cg03088219"
##  [46] "cg24883219" "cg15535896" "cg16338321" "cg21757617" "cg18285382" "cg17429539" "cg10738648" "cg02078724" "cg09015880" "cg20823859" "cg18816397" "cg16431720" "cg06833284" "cg23517115" "cg11438323"
##  [61] "cg02932958" "cg08096656" "cg05064044" "cg05234269" "cg25169289" "cg14710850" "cg26679884" "cg03600007" "cg15098922" "cg01921484" "cg16715186" "cg06961873" "cg12240569" "cg01910713" "cg25712921"
##  [76] "cg00648024" "cg03982462" "cg08745107" "cg26983017" "cg00084271" "cg16858433" "cg06371647" "cg26846609" "cg15184869" "cg13573375" "cg04831745" "cg22931151" "cg18918831" "cg07640670" "cg15600437"
##  [91] "cg01280698" "cg12689021" "cg27577781" "cg13405878" "cg22666875" "cg16536985" "cg16202259" "cg18857647" "cg22305850" "cg27224751" "cg09247979" "cg12333628" "cg16571124" "cg03979311" "cg12421087"
## [106] "cg15700429" "cg13739190" "cg00819121" "cg25436480" "cg04768387" "cg24634455" "cg11133939" "cg17042243" "cg22542451" "cg01608425" "cg06864789" "cg06880438" "cg13387643" "cg12702014" "cg03737947"
## [121] "cg02823329" "cg00696044" "cg06960717" "cg20673830" "cg25649515" "cg10681981" "cg15633912" "cg02668233" "cg27272246" "cg18150287" "cg18339359" "cg04718469" "cg01933473" "cg02122327" "cg18993517"
## [136] "cg02495179" "cg02356645" "cg09216282" "cg09584650" "cg00512739" "cg23352245" "cg12776173" "cg19301366" "cg25758034" "cg04316537" "cg14687298" "cg13226272" "cg13372276" "cg12556569" "cg06277607"
## [151] "cg17002338" "cg24307368" "cg14627380" "cg10091792" "cg08584917" "cg18819889" "cg24697433" "cg03084184" "cg23159970" "cg22112152" "cg12784167" "cg08198851" "cg17129965" "cg00939409" "cg08788093"
## [166] "cg09451339" "cg20078646" "cg10788927" "cg16089727" "cg00146240" "cg15775217" "cg18526121" "cg01662749" "cg14192979" "cg03672288" "cg25306893" "cg05392160" "cg05321907" "cg25277809" "cg05876883"
## [181] "cg06715136" "cg06483046" "cg14307563" "cg14170504" "cg04497611" "cg24139837" "cg05161773" "cg05593887" "cg11286989" "cg10240127" "cg27160885" "cg01549082" "cg04412904" "cg14532717" "cg06118351"
## [196] "cg22535849" "cg11706829" "cg00322003" "cg08554146" "cg02627240" "cg18029737" "cg17723206" "cg03549208" "cg21986118" "cg05850457" "cg09785377" "cg14293999" "cg07138269" "cg15985500" "cg14780448"
## [211] "cg04124201" "cg17738613" "cg17906851" "cg22169467" "cg22071943" "cg20981163" "cg10039445" "cg02246922" "cg08896901" "cg02631626" "cg11247378" "cg08857872" "cg00295418" "cg14507637" "cg18949721"
## [226] "cg11187460" "cg12146221" "cg08041188" "cg04867412" "cg00345083" "cg11268585" "cg21388339" "cg12228670" "cg23916408" "cg26901661" "cg21243064" "cg06403901" "cg15730644" "cg00322820" "cg04645024"
## [241] "cg24643105" "cg03221390" "cg21139150" "cg17131279" "cg15501526" "cg13653328" "cg24470466" "cg23836570" "cg13038195" "cg04664583"
output_mean_process<-processed_data[,c("DX",selected_impAvg_ordered_NAME)]
print(head(output_mean_process))
## # A tibble: 6 × 251
##   DX    cg23432430      PC3       PC2 cg00962106      PC1 cg07158503 cg06697310 cg11331837 cg07634717 cg03660162 cg24851651 cg11019791 cg20685672 cg26081710 cg14168080 cg03749159 cg20678988 cg04156077
##   <fct>      <dbl>    <dbl>     <dbl>      <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CI         0.948 -0.0140    1.47e-2      0.912 -0.214        0.578      0.845     0.0369      0.748      0.869     0.0367      0.811     0.671       0.875      0.419      0.936      0.844      0.732
## 2 CN         0.946  0.00506   5.75e-2      0.538 -0.173        0.620      0.865     0.572       0.825      0.516     0.0536      0.783     0.793       0.920      0.442      0.915      0.855      0.687
## 3 CN         0.942  0.0291    8.37e-2      0.504 -0.00367      0.624      0.241     0.0318      0.818      0.903     0.0597      0.435     0.661       0.880      0.436      0.926      0.779      0.850
## 4 CI         0.943 -0.0323   -1.12e-2      0.904 -0.187        0.599      0.848     0.0383      0.758      0.531     0.609       0.850     0.808       0.915      0.957      0.629      0.826      0.680
## 5 CI         0.946  0.0529    1.65e-5      0.896  0.0268       0.631      0.821     0.930       0.826      0.926     0.0883      0.854     0.0829      0.917      0.946      0.929      0.330      0.891
## 6 CN         0.951 -0.00869   1.57e-2      0.886 -0.0379       0.615      0.784     0.540       0.210      0.894     0.919       0.738     0.845       0.923      0.399      0.612      0.854      0.837
## # ℹ 232 more variables: cg19503462 <dbl>, cg26853071 <dbl>, age.now <dbl>, cg11540596 <dbl>, cg00415024 <dbl>, cg10701746 <dbl>, cg00004073 <dbl>, cg11227702 <dbl>, cg19471911 <dbl>,
## #   cg09727210 <dbl>, cg00154902 <dbl>, cg17002719 <dbl>, cg07504457 <dbl>, cg25879395 <dbl>, cg01008088 <dbl>, cg02225060 <dbl>, cg12543766 <dbl>, cg09120722 <dbl>, cg11787167 <dbl>,
## #   cg19799454 <dbl>, cg02887598 <dbl>, cg01128042 <dbl>, cg21697769 <dbl>, cg25208881 <dbl>, cg16779438 <dbl>, cg17386240 <dbl>, cg03088219 <dbl>, cg24883219 <dbl>, cg15535896 <dbl>,
## #   cg16338321 <dbl>, cg21757617 <dbl>, cg18285382 <dbl>, cg17429539 <dbl>, cg10738648 <dbl>, cg02078724 <dbl>, cg09015880 <dbl>, cg20823859 <dbl>, cg18816397 <dbl>, cg16431720 <dbl>,
## #   cg06833284 <dbl>, cg23517115 <dbl>, cg11438323 <dbl>, cg02932958 <dbl>, cg08096656 <dbl>, cg05064044 <dbl>, cg05234269 <dbl>, cg25169289 <dbl>, cg14710850 <dbl>, cg26679884 <dbl>,
## #   cg03600007 <dbl>, cg15098922 <dbl>, cg01921484 <dbl>, cg16715186 <dbl>, cg06961873 <dbl>, cg12240569 <dbl>, cg01910713 <dbl>, cg25712921 <dbl>, cg00648024 <dbl>, cg03982462 <dbl>,
## #   cg08745107 <dbl>, cg26983017 <dbl>, cg00084271 <dbl>, cg16858433 <dbl>, cg06371647 <dbl>, cg26846609 <dbl>, cg15184869 <dbl>, cg13573375 <dbl>, cg04831745 <dbl>, cg22931151 <dbl>, …
dim(output_mean_process)
## [1] 648 251

Based on Median

Selected_median_imp <- head(combined_importance_quantiles,n = Number_fea_input)
print(head(Selected_median_imp))
##        Feature          0%        25%       50%       75%      100%
## 275 cg23432430 0.491990101 0.62272860 1.0000000 1.0000000 1.0000000
## 313        PC3 0.000000000 0.16666667 0.5539209 1.0000000 1.0000000
## 1      age.now 0.003392297 0.00416345 0.5000000 0.5289698 0.6349744
## 311        PC1 0.032231598 0.30910796 0.5000000 0.6684105 0.7598925
## 312        PC2 0.333333333 0.38044700 0.4308152 0.5468701 0.8452667
## 104 cg07158503 0.310031430 0.40415432 0.4231865 0.5000000 0.5113692
Selected_median_imp_Name<-Selected_median_imp$Feature
print(head(Selected_median_imp_Name))
## [1] "cg23432430" "PC3"        "age.now"    "PC1"        "PC2"        "cg07158503"
df_selected_Median <- processed_dataFrame[,c("DX",Selected_median_imp_Name)]
output_median_feature<-processed_data[,c("DX",Selected_median_imp_Name)]
  
print(head(df_selected_Median))
##                     DX cg23432430          PC3 age.now          PC1        PC2 cg07158503 cg00962106 cg07634717 cg06697310 cg14168080 cg03660162 cg02225060 cg07504457 cg10701746 cg20678988 cg09015880
## 200223270003_R02C01 CI  0.9482702 -0.014043316    82.4 -0.214185447 0.01470293  0.5777146  0.9124898  0.7483382  0.8454609  0.4190123  0.8691767  0.6828159  0.7116230  0.4795503  0.8438718  0.5101716
## 200223270003_R03C01 CN  0.9455418  0.005055871    78.6 -0.172761185 0.05745834  0.6203543  0.5375751  0.8254434  0.8653044  0.4420256  0.5160770  0.8265195  0.6854539  0.4868342  0.8548886  0.8402106
## 200223270003_R06C01 CN  0.9418716  0.029143653    80.4 -0.003667305 0.08372861  0.6236025  0.5040948  0.8181246  0.2405168  0.4355521  0.9026304  0.5209552  0.7205633  0.4927257  0.7786685  0.8472063
##                     cg19799454 cg00004073 cg00154902 cg02887598 cg09727210 cg11227702 cg11331837 cg16338321 cg24851651 cg25208881 cg19503462 cg03749159  cg03088219 cg26081710 cg09120722 cg11787167
## 200223270003_R02C01  0.9178930 0.02928535  0.5137741 0.04020908  0.4240111 0.86486075 0.03692842  0.5350242 0.03674702  0.1851956  0.7951675  0.9355921 0.844002862  0.8751040  0.5878977 0.03853894
## 200223270003_R03C01  0.9106247 0.02787198  0.8540746 0.67073881  0.8812928 0.49184121 0.57150125  0.8294062 0.05358297  0.9092286  0.4537684  0.9153921 0.007435243  0.9198212  0.8287506 0.04673831
## 200223270003_R06C01  0.9066551 0.64576463  0.8188126 0.73408417  0.8493743 0.02543724 0.03182862  0.4918708 0.05968923  0.9265502  0.6997359  0.9255807 0.120155222  0.8801892  0.8793344 0.32564508
##                     cg12543766 cg19471911 cg11540596 cg01921484 cg00415024 cg12689021 cg21757617 cg01128042 cg17002719 cg16715186 cg05234269 cg12421087 cg05064044 cg15184869 cg23517115 cg00819121
## 200223270003_R02C01 0.51028134  0.6334393  0.9238951  0.9098550  0.4299553  0.7706828 0.03652647  0.9113420 0.04939181  0.2742789 0.93848584  0.5647607  0.5672851  0.8622328  0.2151144  0.9207001
## 200223270003_R03C01 0.88741539  0.8437175  0.8926595  0.9093137  0.3999122  0.7449475 0.44299089  0.5328806 0.40466475  0.7946153 0.57461229  0.5399655  0.5358875  0.8996252  0.9131440  0.9281472
## 200223270003_R06C01 0.02818501  0.6127952  0.8820252  0.9204487  0.7465084  0.7872237 0.44725379  0.5222757 0.51428089  0.8124316 0.02467208  0.5400348  0.5273964  0.8688117  0.8328364  0.9327211
##                     cg11019791 cg04156077 cg01910713 cg16779438 cg25169289 cg03979311 cg14710850 cg00648024 cg25712921 cg27272246 cg18816397 cg18285382 cg08096656 cg15535896 cg13573375 cg20673830
## 200223270003_R02C01  0.8112324  0.7321883  0.8573169  0.8826150  0.1100884 0.86644909  0.8048592 0.51410972  0.2829848  0.8615873  0.5472925  0.3202927  0.9362594  0.3382952  0.8670419  0.2422052
## 200223270003_R03C01  0.7831231  0.6865805  0.8538850  0.5466924  0.7667174 0.06199853  0.8090950 0.40202875  0.6220919  0.8705287  0.4940355  0.2930577  0.9314878  0.9253926  0.1733934  0.6881735
## 200223270003_R06C01  0.4353250  0.8501188  0.8110366  0.8629492  0.2264993 0.72615553  0.8285902 0.05579011  0.6384003  0.8103777  0.5337018  0.8923595  0.4943033  0.3320191  0.8888246  0.2134634
##                     cg26853071 cg15600437 cg16431720 cg25436480 cg27577781 cg06277607 cg08745107 cg03982462 cg25879395 cg20823859 cg06960717 cg06961873 cg10738648 cg20685672 cg09584650 cg07640670
## 200223270003_R02C01  0.4233820  0.4885353  0.7356099  0.8425160  0.8143535 0.10744587 0.02921338  0.8562777 0.88130864  0.9030711  0.7030978  0.5335591 0.44931577  0.6712101 0.08230254 0.58296513
## 200223270003_R03C01  0.7451354  0.4894487  0.8692449  0.4994032  0.8113185 0.09353494 0.78542320  0.6023731 0.02603438  0.6062985  0.7653402  0.5472606 0.49894016  0.7932091 0.09661586 0.55225610
## 200223270003_R06C01  0.4228079  0.8551374  0.8773137  0.3494312  0.8144274 0.09504696 0.02709928  0.8778458 0.91060615  0.8917348  0.7206218  0.9415177 0.05552024  0.6613646 0.52399749 0.04058533
##                     cg12702014 cg16858433 cg00512739 cg15098922 cg26679884 cg16536985 cg24883219 cg05876883 cg06371647 cg02823329 cg12556569 cg22666875 cg13387643 cg09216282 cg02078724 cg15700429
## 200223270003_R02C01  0.7704049  0.9184356  0.9337648  0.9286092  0.6793815  0.5789643  0.6430473  0.9039064  0.8336894  0.9462397 0.06218231  0.8177182  0.4229959  0.9349248  0.3096774  0.7879010
## 200223270003_R03C01  0.7848681  0.9194211  0.8863895  0.9027517  0.1848705  0.5418687  0.6822115  0.9223308  0.8198684  0.6464005 0.03924599  0.8291957  0.4200273  0.9244259  0.2896133  0.9114530
## 200223270003_R06C01  0.8065993  0.9271632  0.9242748  0.8525611  0.1701734  0.8392044  0.5296903  0.4697980  0.8069537  0.9633930 0.48636893  0.3694180  0.4161488  0.9263996  0.2805612  0.8838233
##                     cg17429539 cg08584917 cg01608425 cg08788093 cg22542451 cg00084271 cg21697769 cg05593887 cg18918831 cg08198851 cg22931151 cg18857647 cg18150287 cg00939409 cg01008088 cg17723206
## 200223270003_R02C01  0.7860900  0.5663205  0.9030410 0.03911678  0.5884356  0.8103611  0.8946108  0.5939220  0.4891660  0.6578905  0.9311023  0.8582332  0.7685695  0.2652180  0.8424817 0.92881042
## 200223270003_R03C01  0.7100923  0.9019732  0.9264388 0.60934160  0.8337068  0.7877006  0.2822953  0.5766550  0.5333801  0.6578186  0.9356702  0.8394132  0.7519166  0.8882671  0.2417656 0.48556255
## 200223270003_R06C01  0.7660838  0.9187789  0.8887753 0.88380243  0.8125084  0.7706165  0.8698740  0.9148338  0.6406575  0.1272153  0.9328614  0.2647491  0.2501173  0.8842646  0.2618620 0.01765023
##                     cg05321907 cg12776173 cg02932958 cg09247979 cg14170504 cg25306893 cg25758034 cg25649515 cg22305850 cg13405878 cg14687298 cg12240569 cg19301366 cg05161773 cg11133939 cg01933473
## 200223270003_R02C01  0.2880477  0.1038804  0.7901008  0.5070956 0.54915621  0.6265392  0.6114028  0.9279829 0.03361934  0.4549662 0.04206702 0.82772064  0.8831393  0.4120912  0.1282694  0.2589014
## 200223270003_R03C01  0.1782629  0.8730635  0.4210489  0.5706177 0.02236650  0.8330282  0.6649219  0.9235753 0.57522232  0.7858042 0.14813581 0.02690547  0.8072679  0.4154907  0.5920898  0.6726133
## 200223270003_R06C01  0.8427929  0.7009491  0.3825995  0.5090215 0.02988245  0.6175380  0.2393844  0.5895839 0.58548744  0.7583938 0.24260002 0.46030640  0.8796022  0.8526849  0.5127706  0.2642560
##                     cg26983017 cg24697433 cg18993517 cg02122327 cg11706829 cg17906851 cg17386240 cg15633912 cg16571124 cg03549208 cg02495179 cg06880438 cg10681981 cg13739190 cg09785377 cg11438323
## 200223270003_R02C01 0.89868232  0.9243095  0.2091538 0.38940091  0.8897234  0.9488392  0.7473400  0.1605530  0.9282854  0.9014487  0.6813307  0.8285145  0.7035090  0.8510103  0.9162088  0.4863471
## 200223270003_R03C01 0.03145466  0.6808390  0.2665896 0.37769608  0.5444785  0.9529718  0.7144809  0.9333421  0.9206431  0.8381784  0.7373055  0.7988881  0.7382662  0.8358482  0.9226292  0.8984559
## 200223270003_R06C01 0.84677625  0.6384606  0.2574003 0.04017909  0.5669449  0.6462151  0.8074824  0.8737362  0.9276842  0.9097817  0.5588114  0.7839538  0.6971989  0.8419471  0.6405193  0.8722772
##                     cg22071943 cg26846609 cg24634455 cg01280698 cg06833284 cg02668233 cg04831745 cg00322003 cg01662749 cg24307368 cg04497611 cg00146240 cg00696044 cg02627240 cg03672288 cg03737947
## 200223270003_R02C01  0.8705217 0.48860949  0.7796391  0.8985067  0.9125144  0.4708431 0.61984995  0.1759911  0.3506201 0.64323677  0.9086359  0.6336151 0.55608424 0.66706843  0.9235592 0.91824910
## 200223270003_R03C01  0.2442648 0.04878986  0.5188241  0.8846201  0.9003482  0.8841930 0.71214149  0.5702070  0.2510946 0.34980461  0.8818513  0.8957183 0.07552381 0.57129408  0.6718625 0.92067153
## 200223270003_R06C01  0.2644581 0.48026945  0.5325725  0.8847132  0.6097933  0.4575646 0.06871768  0.3077122  0.8061480 0.02720398  0.5853116  0.1433218 0.79270858 0.05309659  0.9007629 0.03638091
##                     cg04316537 cg06118351 cg06403901 cg06483046 cg06864789 cg07138269 cg08554146 cg08857872 cg10240127 cg11187460 cg11286989 cg11314779 cg12228670 cg13372276 cg13653328 cg14293999
## 200223270003_R02C01  0.8074830  0.3633940 0.92790690 0.04383925 0.05369415  0.5002290  0.8982080  0.3395280  0.9250553 0.03672179  0.7590008  0.0242134  0.8632174 0.04888111  0.9245434  0.2836710
## 200223270003_R03C01  0.8453340  0.4714860 0.04783341 0.50720277 0.46053125  0.9426707  0.8963074  0.8181845  0.9403255 0.92516409  0.8533989  0.8966100  0.8496212 0.62396373  0.5122938  0.9172023
## 200223270003_R06C01  0.4351695  0.8655962 0.05253626 0.89604910 0.87513655  0.5057781  0.8213878  0.2970779  0.9056974 0.03109553  0.7313884  0.8908661  0.8738949 0.59693465  0.9362798  0.9168166
##                     cg14532717 cg14780448 cg15730644 cg15985500 cg17002338 cg17042243 cg17738613 cg18819889 cg18949721 cg21986118 cg23066280 cg23916408 cg24139837 cg25277809 cg27160885 cg05392160
## 200223270003_R02C01  0.5732280  0.9119141  0.4803181  0.8555262  0.9286251  0.2502905  0.6879612  0.9156157  0.2334245  0.6658175 0.07247841  0.1942275 0.07404605  0.1632342  0.2231606  0.9328933
## 200223270003_R03C01  0.1107638  0.6702102  0.4353906  0.8312198  0.2684163  0.2933475  0.6582258  0.9004455  0.2437792  0.6571296 0.57174588  0.9154993 0.04183445  0.4913711  0.8263885  0.2576881
## 200223270003_R06C01  0.6273416  0.6207355  0.8763048  0.8492103  0.2811103  0.2725457  0.1022257  0.9054439  0.2523095  0.7034445 0.80814756  0.8886255 0.05657120  0.5952124  0.2121179  0.8920726
##                     cg02631626 cg23352245 cg21139150 cg04124201 cg10666341 cg18339359 cg22169467 cg04888234 cg25059696 cg06715136 cg03600007 cg10091792 cg14192979 cg20078646 cg27224751 cg04412904
## 200223270003_R02C01  0.6280766  0.9377232 0.01853264  0.8686421  0.9046648  0.8824858  0.3095010  0.8379655  0.9017504  0.3400192  0.5658487  0.8670733 0.06336040 0.06198170 0.44503947 0.05088595
## 200223270003_R03C01  0.1951736  0.9375774 0.43223243  0.3308589  0.6731062  0.9040272  0.2978585  0.4376314  0.3047156  0.9259109  0.6018832  0.5864221 0.06019651 0.89537412 0.03214912 0.07717659
## 200223270003_R06C01  0.2699849  0.5932742 0.43772680  0.3241613  0.6443180  0.8552121  0.8955853  0.8039047  0.3051179  0.9079807  0.8611166  0.6087997 0.52114282 0.08725521 0.83123722 0.08253743
##                     cg17129965 cg14507637 cg14307563 cg20981163 cg22535849 cg18029737 cg14627380 cg10788927 cg08041188 cg13226272 cg11247378 cg02772171 cg04462915 cg03221390 cg22112152 cg04664583
## 200223270003_R02C01  0.8972140  0.9051258  0.1855966  0.8990628  0.8847704  0.9100454  0.9455369  0.8973154  0.7752456 0.02637249  0.1591185  0.9182018 0.03224861  0.5859063  0.8476101  0.5572814
## 200223270003_R03C01  0.8806673  0.9009460  0.8916957  0.9264076  0.8609966  0.9016634  0.9258964  0.2021398  0.3201255 0.54100016  0.7874849  0.5660559 0.50740695  0.9180706  0.8014136  0.5881190
## 200223270003_R06C01  0.8857237  0.9013686  0.8750052  0.4874651  0.8808022  0.7376586  0.5789898  0.2053075  0.7900939 0.44370701  0.4807942  0.8995479 0.02700644  0.6399867  0.7897897  0.9352717
##                     cg20803293 cg09451339 cg16733676 cg22741595 cg04242342 cg00295418 cg06012903 cg00345083 cg10039445 cg13368637 cg04718469 cg16089727 cg06231502 cg02550738 cg05850457 cg08896901
## 200223270003_R02C01 0.54933918  0.2243746  0.9057228  0.6525533  0.8206769 0.44954665  0.7964595 0.47960968  0.8833873  0.5597507  0.8687522 0.86748697  0.7784451  0.6201457  0.8183013  0.3581911
## 200223270003_R03C01 0.07935747  0.2340702  0.8904541  0.1730013  0.8167892 0.48471295  0.1933431 0.50833875  0.8954055  0.9100088  0.7256813 0.54996692  0.7964278  0.9011727  0.8313023  0.2467071
## 200223270003_R06C01 0.42191244  0.8921284  0.1698111  0.1550739  0.8040357 0.02004532  0.1960773 0.03929249  0.8832807  0.8739205  0.8521881 0.05876736  0.7706160  0.9085849  0.8161364  0.9225209
##                     cg17268094 cg01549082 cg12146221 cg06394820 cg26901661 cg12784167 cg13815695 cg01462799 cg00322820 cg02356645
## 200223270003_R02C01  0.5774753  0.2924138  0.2049284  0.8513195  0.8951971 0.81503498  0.9267057  0.8284427  0.4869764  0.5105903
## 200223270003_R03C01  0.9003262  0.7065693  0.1814927  0.8695521  0.8754981 0.02811410  0.6859729  0.4038824  0.4858988  0.5833923
## 200223270003_R06C01  0.8789368  0.2895440  0.8619250  0.4415020  0.9021064 0.03073269  0.6509046  0.4676821  0.4754313  0.5701428
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
dim(df_selected_Median)
## [1] 648 251
print(Selected_median_imp_Name)
##   [1] "cg23432430" "PC3"        "age.now"    "PC1"        "PC2"        "cg07158503" "cg00962106" "cg07634717" "cg06697310" "cg14168080" "cg03660162" "cg02225060" "cg07504457" "cg10701746" "cg20678988"
##  [16] "cg09015880" "cg19799454" "cg00004073" "cg00154902" "cg02887598" "cg09727210" "cg11227702" "cg11331837" "cg16338321" "cg24851651" "cg25208881" "cg19503462" "cg03749159" "cg03088219" "cg26081710"
##  [31] "cg09120722" "cg11787167" "cg12543766" "cg19471911" "cg11540596" "cg01921484" "cg00415024" "cg12689021" "cg21757617" "cg01128042" "cg17002719" "cg16715186" "cg05234269" "cg12421087" "cg05064044"
##  [46] "cg15184869" "cg23517115" "cg00819121" "cg11019791" "cg04156077" "cg01910713" "cg16779438" "cg25169289" "cg03979311" "cg14710850" "cg00648024" "cg25712921" "cg27272246" "cg18816397" "cg18285382"
##  [61] "cg08096656" "cg15535896" "cg13573375" "cg20673830" "cg26853071" "cg15600437" "cg16431720" "cg25436480" "cg27577781" "cg06277607" "cg08745107" "cg03982462" "cg25879395" "cg20823859" "cg06960717"
##  [76] "cg06961873" "cg10738648" "cg20685672" "cg09584650" "cg07640670" "cg12702014" "cg16858433" "cg00512739" "cg15098922" "cg26679884" "cg16536985" "cg24883219" "cg05876883" "cg06371647" "cg02823329"
##  [91] "cg12556569" "cg22666875" "cg13387643" "cg09216282" "cg02078724" "cg15700429" "cg17429539" "cg08584917" "cg01608425" "cg08788093" "cg22542451" "cg00084271" "cg21697769" "cg05593887" "cg18918831"
## [106] "cg08198851" "cg22931151" "cg18857647" "cg18150287" "cg00939409" "cg01008088" "cg17723206" "cg05321907" "cg12776173" "cg02932958" "cg09247979" "cg14170504" "cg25306893" "cg25758034" "cg25649515"
## [121] "cg22305850" "cg13405878" "cg14687298" "cg12240569" "cg19301366" "cg05161773" "cg11133939" "cg01933473" "cg26983017" "cg24697433" "cg18993517" "cg02122327" "cg11706829" "cg17906851" "cg17386240"
## [136] "cg15633912" "cg16571124" "cg03549208" "cg02495179" "cg06880438" "cg10681981" "cg13739190" "cg09785377" "cg11438323" "cg22071943" "cg26846609" "cg24634455" "cg01280698" "cg06833284" "cg02668233"
## [151] "cg04831745" "cg00322003" "cg01662749" "cg24307368" "cg04497611" "cg00146240" "cg00696044" "cg02627240" "cg03672288" "cg03737947" "cg04316537" "cg06118351" "cg06403901" "cg06483046" "cg06864789"
## [166] "cg07138269" "cg08554146" "cg08857872" "cg10240127" "cg11187460" "cg11286989" "cg11314779" "cg12228670" "cg13372276" "cg13653328" "cg14293999" "cg14532717" "cg14780448" "cg15730644" "cg15985500"
## [181] "cg17002338" "cg17042243" "cg17738613" "cg18819889" "cg18949721" "cg21986118" "cg23066280" "cg23916408" "cg24139837" "cg25277809" "cg27160885" "cg05392160" "cg02631626" "cg23352245" "cg21139150"
## [196] "cg04124201" "cg10666341" "cg18339359" "cg22169467" "cg04888234" "cg25059696" "cg06715136" "cg03600007" "cg10091792" "cg14192979" "cg20078646" "cg27224751" "cg04412904" "cg17129965" "cg14507637"
## [211] "cg14307563" "cg20981163" "cg22535849" "cg18029737" "cg14627380" "cg10788927" "cg08041188" "cg13226272" "cg11247378" "cg02772171" "cg04462915" "cg03221390" "cg22112152" "cg04664583" "cg20803293"
## [226] "cg09451339" "cg16733676" "cg22741595" "cg04242342" "cg00295418" "cg06012903" "cg00345083" "cg10039445" "cg13368637" "cg04718469" "cg16089727" "cg06231502" "cg02550738" "cg05850457" "cg08896901"
## [241] "cg17268094" "cg01549082" "cg12146221" "cg06394820" "cg26901661" "cg12784167" "cg13815695" "cg01462799" "cg00322820" "cg02356645"
print(head(output_median_feature))
## # A tibble: 6 × 251
##   DX    cg23432430      PC3 age.now      PC1        PC2 cg07158503 cg00962106 cg07634717 cg06697310 cg14168080 cg03660162 cg02225060 cg07504457 cg10701746 cg20678988 cg09015880 cg19799454 cg00004073
##   <fct>      <dbl>    <dbl>   <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CI         0.948 -0.0140     82.4 -0.214    0.0147         0.578      0.912      0.748      0.845      0.419      0.869      0.683      0.712      0.480      0.844      0.510      0.918     0.0293
## 2 CN         0.946  0.00506    78.6 -0.173    0.0575         0.620      0.538      0.825      0.865      0.442      0.516      0.827      0.685      0.487      0.855      0.840      0.911     0.0279
## 3 CN         0.942  0.0291     80.4 -0.00367  0.0837         0.624      0.504      0.818      0.241      0.436      0.903      0.521      0.721      0.493      0.779      0.847      0.907     0.646 
## 4 CI         0.943 -0.0323     78.2 -0.187   -0.0112         0.599      0.904      0.758      0.848      0.957      0.531      0.808      0.187      0.855      0.826      0.487      0.922     0.624 
## 5 CI         0.946  0.0529     62.9  0.0268   0.0000165      0.631      0.896      0.826      0.821      0.946      0.926      0.608      0.235      0.488      0.330      0.889      0.914     0.412 
## 6 CN         0.951 -0.00869    80.7 -0.0379   0.0157         0.615      0.886      0.210      0.784      0.399      0.894      0.764      0.730      0.842      0.854      0.906      0.921     0.393 
## # ℹ 232 more variables: cg00154902 <dbl>, cg02887598 <dbl>, cg09727210 <dbl>, cg11227702 <dbl>, cg11331837 <dbl>, cg16338321 <dbl>, cg24851651 <dbl>, cg25208881 <dbl>, cg19503462 <dbl>,
## #   cg03749159 <dbl>, cg03088219 <dbl>, cg26081710 <dbl>, cg09120722 <dbl>, cg11787167 <dbl>, cg12543766 <dbl>, cg19471911 <dbl>, cg11540596 <dbl>, cg01921484 <dbl>, cg00415024 <dbl>,
## #   cg12689021 <dbl>, cg21757617 <dbl>, cg01128042 <dbl>, cg17002719 <dbl>, cg16715186 <dbl>, cg05234269 <dbl>, cg12421087 <dbl>, cg05064044 <dbl>, cg15184869 <dbl>, cg23517115 <dbl>,
## #   cg00819121 <dbl>, cg11019791 <dbl>, cg04156077 <dbl>, cg01910713 <dbl>, cg16779438 <dbl>, cg25169289 <dbl>, cg03979311 <dbl>, cg14710850 <dbl>, cg00648024 <dbl>, cg25712921 <dbl>,
## #   cg27272246 <dbl>, cg18816397 <dbl>, cg18285382 <dbl>, cg08096656 <dbl>, cg15535896 <dbl>, cg13573375 <dbl>, cg20673830 <dbl>, cg26853071 <dbl>, cg15600437 <dbl>, cg16431720 <dbl>,
## #   cg25436480 <dbl>, cg27577781 <dbl>, cg06277607 <dbl>, cg08745107 <dbl>, cg03982462 <dbl>, cg25879395 <dbl>, cg20823859 <dbl>, cg06960717 <dbl>, cg06961873 <dbl>, cg10738648 <dbl>,
## #   cg20685672 <dbl>, cg09584650 <dbl>, cg07640670 <dbl>, cg12702014 <dbl>, cg16858433 <dbl>, cg00512739 <dbl>, cg15098922 <dbl>, cg26679884 <dbl>, cg16536985 <dbl>, cg24883219 <dbl>, …

Based on Frequency

Function for Frequency Selection

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)

The frequency / common feature importance is processed in the following:

  1. Select the TOP Number of features for each model (This number is set to “Number_fea_input” this session, Number_fea_input <- INPUT_NUMBER_FEATURES , and “INPUT_NUMBER_FEATURES” in the INPUT session )
  2. Calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
  3. For each features that appear greater or equal than half time, we consider it’s important and collect these important features as common features.
n_select_frequencyWay <- Number_fea_input
combined_importance_freq_ordered_df <- combined_importance_Avg_ordered
df_Selected_Frequency_Imp <- function(n_select_frequencyWay,FeatureImportanceTable){
# In this function, we Input the feature importance data frame, 
# And process with the steps we discussed before.
# The output will be the feature frequency Table. 
#  (i.e. frequency of the appearance of each features based on the Top Number of features selected)
  
  
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature

# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature


# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature


# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature


# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature


# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))

models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models), 
                         dimnames = list(all_features, models))

# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
  feature_matrix[feature, "LRM"] <- 
    as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
  feature_matrix[feature, "XGB"] <- 
    as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
  feature_matrix[feature, "ENM"] <- 
    as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
  feature_matrix[feature, "RF"] <- 
    as.integer(feature %in% top_impAvg_orderby_RF_NAME)
  feature_matrix[feature, "SVM"] <- 
    as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}

# Convert the matrix to a data frame
feature_df <- as.data.frame(feature_matrix)

feature_df$Total_Count <- rowSums(feature_df[,1:5])
# Sort the dataframe by the Total_Count in descending order
feature_df <- feature_df[order(-feature_df$Total_Count), ]
print(feature_df)
return(feature_df)
}

Now, the function will be tested below:

df_Func_test<-df_Selected_Frequency_Imp(NUM_COMMON_FEATURES_SET_Frequency,combined_importance_freq_ordered_df)
##            LRM XGB ENM RF SVM Total_Count
## cg23432430   1   1   1  1   1           5
## PC2          1   1   1  1   0           4
## cg07158503   1   1   1  0   1           4
## PC3          1   0   1  1   0           3
## PC1          1   0   1  0   1           3
## cg00962106   1   0   1  0   1           3
## cg06697310   1   0   1  1   0           3
## cg26081710   1   0   1  0   1           3
## cg09727210   1   0   1  0   0           2
## cg02225060   1   0   1  0   0           2
## cg09015880   1   0   1  0   0           2
## cg16338321   1   0   1  0   0           2
## cg00819121   1   0   1  0   0           2
## cg00415024   1   0   0  1   0           2
## cg21757617   1   0   1  0   0           2
## cg14168080   1   1   0  0   0           2
## cg02887598   1   0   1  0   0           2
## cg05064044   1   0   1  0   0           2
## cg03660162   0   1   0  0   1           2
## cg07634717   0   1   0  0   1           2
## cg19799454   0   1   0  0   1           2
## cg20678988   0   1   0  0   1           2
## cg11019791   0   1   0  1   0           2
## cg10701746   1   0   0  0   0           1
## cg01910713   1   0   0  0   0           1
## age.now      0   1   0  0   0           1
## cg11438323   0   1   0  0   0           1
## cg11540596   0   1   0  0   0           1
## cg17002719   0   1   0  0   0           1
## cg09120722   0   1   0  0   0           1
## cg17002338   0   1   0  0   0           1
## cg11227702   0   1   0  0   0           1
## cg18816397   0   1   0  0   0           1
## cg02122327   0   1   0  0   0           1
## cg13573375   0   1   0  0   0           1
## cg03088219   0   1   0  0   0           1
## cg06277607   0   0   1  0   0           1
## cg27272246   0   0   1  0   0           1
## cg00004073   0   0   1  0   0           1
## cg17429539   0   0   1  0   0           1
## cg03749159   0   0   0  1   0           1
## cg11331837   0   0   0  1   0           1
## cg21697769   0   0   0  1   0           1
## cg01008088   0   0   0  1   0           1
## cg04768387   0   0   0  1   0           1
## cg16431720   0   0   0  1   0           1
## cg12784167   0   0   0  1   0           1
## cg23159970   0   0   0  1   0           1
## cg24851651   0   0   0  1   0           1
## cg17042243   0   0   0  1   0           1
## cg09451339   0   0   0  1   0           1
## cg17386240   0   0   0  1   0           1
## cg04109990   0   0   0  1   0           1
## cg14192979   0   0   0  1   0           1
## cg12333628   0   0   0  0   1           1
## cg20685672   0   0   0  0   1           1
## cg26853071   0   0   0  0   1           1
## cg24883219   0   0   0  0   1           1
## cg06833284   0   0   0  0   1           1
## cg03600007   0   0   0  0   1           1
## cg01280698   0   0   0  0   1           1
## cg13226272   0   0   0  0   1           1
## cg15775217   0   0   0  0   1           1
## cg04156077   0   0   0  0   1           1
## cg19503462   0   0   0  0   1           1
# The expected output should be zero.
sum(df_Func_test!=frequency_feature_df_RAW_ordered)
## [1] 0

Selected data frame based on Frequency for Output

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)

The frequency / common feature importance is processed in the following:

  1. Select the TOP Number of features for each model (This number is set to “Number_fea_input” this session, Number_fea_input <- INPUT_NUMBER_FEATURES , and “INPUT_NUMBER_FEATURES” in the INPUT session )
  2. Calculated the frequency of the appearance of each features based on the Top Number of features selected from step1.
  3. For each features that appear greater or equal than half time, we consider it’s important and collect these important features as common features.
n_select_frequencyWay <- Number_fea_input
df_feature_Output_frequency <- df_Selected_Frequency_Imp(Number_fea_input,
                                                         combined_importance_freq_ordered_df)
##            LRM XGB ENM RF SVM Total_Count
## PC1          1   1   1  1   1           5
## cg23432430   1   1   1  1   1           5
## cg09727210   1   1   1  1   1           5
## PC2          1   1   1  1   1           5
## cg00962106   1   1   1  1   1           5
## cg07158503   1   1   1  1   1           5
## cg06697310   1   1   1  1   1           5
## cg02225060   1   1   1  1   1           5
## cg09015880   1   1   1  1   1           5
## cg10701746   1   1   1  1   1           5
## cg16338321   1   1   1  1   1           5
## cg26081710   1   1   1  1   1           5
## cg00415024   1   1   1  1   1           5
## cg21757617   1   1   1  1   1           5
## cg14168080   1   1   1  1   1           5
## cg02887598   1   1   1  1   1           5
## cg05064044   1   1   1  1   1           5
## cg01910713   1   1   1  1   1           5
## cg11331837   1   1   1  1   1           5
## cg07504457   1   1   1  1   1           5
## cg00004073   1   1   1  1   1           5
## cg04156077   1   1   1  1   1           5
## cg10738648   1   1   1  1   1           5
## cg07640670   1   1   1  1   1           5
## cg16858433   1   1   1  1   1           5
## cg12543766   1   1   1  1   1           5
## cg20685672   1   1   1  1   1           5
## cg24851651   1   1   1  1   1           5
## cg20678988   1   1   1  1   1           5
## cg03088219   1   1   1  1   1           5
## cg16536985   1   1   1  1   1           5
## cg05234269   1   1   1  1   1           5
## cg18285382   1   1   1  1   1           5
## cg09216282   1   1   1  1   1           5
## cg00084271   1   1   1  1   1           5
## cg21697769   1   1   1  1   1           5
## cg15098922   1   1   1  1   1           5
## cg27577781   1   1   1  1   1           5
## cg18150287   1   1   1  1   1           5
## cg08096656   1   1   1  1   1           5
## cg19503462   1   1   1  1   1           5
## cg07634717   1   1   1  1   1           5
## cg26853071   1   1   1  1   1           5
## cg09247979   1   1   1  1   1           5
## cg00154902   1   1   1  1   1           5
## cg15184869   1   1   1  1   1           5
## cg19471911   1   1   1  1   1           5
## cg12702014   1   1   1  1   1           5
## cg03979311   1   1   1  1   1           5
## cg11787167   1   1   1  1   1           5
## cg18857647   1   1   1  1   1           5
## cg11540596   1   1   1  1   1           5
## cg25712921   1   1   1  1   1           5
## cg12240569   1   1   1  1   1           5
## cg19301366   1   1   1  1   1           5
## cg25436480   1   1   1  1   1           5
## cg13387643   1   1   1  1   1           5
## cg12421087   1   1   1  1   1           5
## cg11227702   1   1   1  1   1           5
## cg00648024   1   1   1  1   1           5
## cg17002719   1   1   1  1   1           5
## cg15633912   1   1   1  1   1           5
## cg16715186   1   1   1  1   1           5
## cg11019791   1   1   1  1   1           5
## cg06880438   1   1   1  1   1           5
## cg03660162   1   1   1  1   1           5
## cg01008088   1   1   1  1   1           5
## cg15535896   1   1   1  1   1           5
## cg15600437   1   1   1  1   1           5
## cg02078724   1   1   1  1   1           5
## cg20823859   1   1   1  1   1           5
## cg13372276   1   1   1  1   1           5
## cg25208881   1   1   1  1   1           5
## cg26679884   1   1   1  1   1           5
## cg01921484   1   1   1  1   1           5
## cg06960717   1   1   1  1   1           5
## cg25169289   1   1   1  1   1           5
## cg08584917   1   1   1  1   1           5
## cg22305850   1   1   1  1   1           5
## cg11133939   1   1   1  1   1           5
## cg01608425   1   1   1  1   1           5
## cg06371647   1   1   1  1   1           5
## cg03749159   1   1   1  1   1           5
## cg24697433   1   1   1  1   1           5
## cg21986118   1   1   1  1   1           5
## cg18816397   1   1   1  1   1           5
## cg01128042   1   1   1  1   1           5
## cg15700429   1   1   1  1   1           5
## cg25277809   1   1   1  1   1           5
## cg22931151   1   1   1  1   1           5
## cg24634455   1   1   1  1   1           5
## cg13405878   1   1   1  1   1           5
## cg02932958   1   1   1  1   1           5
## cg11286989   1   1   1  1   1           5
## cg05593887   1   1   1  1   1           5
## cg18918831   1   1   1  1   1           5
## cg11247378   1   1   1  1   1           5
## cg24139837   1   1   1  1   1           5
## cg17042243   1   1   1  1   1           5
## cg25879395   1   1   1  1   1           5
## cg18029737   1   1   1  1   1           5
## cg10681981   1   1   1  1   1           5
## cg26846609   1   1   1  1   1           5
## cg14293999   1   1   1  1   1           5
## cg10240127   1   1   1  1   1           5
## cg08198851   1   1   1  1   1           5
## cg18993517   1   1   1  1   1           5
## cg02823329   1   1   1  1   1           5
## cg08745107   1   1   1  1   1           5
## cg13573375   1   1   1  1   1           5
## cg17738613   1   1   1  1   1           5
## cg02356645   1   1   1  1   1           5
## cg05876883   1   1   1  1   1           5
## cg24883219   1   1   1  1   1           5
## cg00696044   1   1   1  1   1           5
## cg17131279   1   1   1  1   1           5
## cg08041188   1   1   1  1   1           5
## cg24307368   1   1   1  1   1           5
## cg06961873   1   1   1  1   1           5
## cg05392160   1   1   1  1   1           5
## cg26983017   1   1   1  1   1           5
## cg07138269   1   1   1  1   1           5
## cg04316537   1   1   1  1   1           5
## cg27224751   1   1   1  1   1           5
## cg04831745   1   1   1  1   1           5
## cg12556569   1   1   1  1   1           5
## cg17386240   1   1   1  1   1           5
## cg04412904   1   1   1  1   1           5
## cg00345083   1   1   1  1   1           5
## cg02668233   1   1   1  1   1           5
## cg10788927   1   1   1  1   1           5
## cg14687298   1   1   1  1   1           5
## cg14170504   1   1   1  1   1           5
## cg03672288   1   1   1  1   1           5
## cg14307563   1   1   1  1   1           5
## cg09451339   1   1   1  1   1           5
## cg16431720   1   1   1  1   1           5
## cg01662749   1   1   1  1   1           5
## cg02495179   1   1   1  1   1           5
## cg04768387   1   1   1  1   1           5
## cg17002338   1   1   1  1   1           5
## cg01933473   1   1   1  1   1           5
## cg16089727   1   1   1  1   1           5
## cg24643105   1   1   1  1   1           5
## PC3          1   0   1  1   1           4
## cg00819121   1   0   1  1   1           4
## cg09120722   1   1   1  0   1           4
## cg27272246   1   1   1  0   1           4
## cg06277607   1   1   1  1   0           4
## cg03982462   1   0   1  1   1           4
## cg09584650   1   1   1  1   0           4
## cg08788093   1   1   1  0   1           4
## cg22666875   1   1   1  1   0           4
## cg22542451   1   1   1  0   1           4
## cg00939409   1   1   1  0   1           4
## cg17723206   1   1   1  0   1           4
## cg05321907   1   0   1  1   1           4
## cg12776173   1   0   1  1   1           4
## cg25758034   1   1   1  0   1           4
## cg14710850   1   0   1  1   1           4
## cg23517115   1   0   1  1   1           4
## cg17429539   1   1   1  0   1           4
## cg17906851   1   1   1  1   0           4
## cg00512739   1   0   1  1   1           4
## cg12689021   1   0   1  1   1           4
## cg16571124   1   1   1  0   1           4
##  [ reached 'max' / getOption("max.print") -- omitted 145 rows ]
Combine with the importance data frame
all_out_features <- union(combined_importance_freq_ordered_df$Feature, rownames(df_feature_Output_frequency))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_output_df_full <- data.frame(Feature = all_out_features)
feature_output_df_full <- merge(feature_output_df_full, df_feature_Output_frequency, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_output_df_full[is.na(feature_output_df_full)] <- 0


# For top_impAvg_ordered
all_output_impAvg_ordered_full <- data.frame(Feature = all_out_features)
all_output_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,
                                        all_output_impAvg_ordered_full, 
                                        by.x = "Feature", 
                                        by.y = "Feature", 
                                        all.x = TRUE)
all_output_impAvg_ordered_full[is.na(all_output_impAvg_ordered_full)] <- 0
all_Output_combined_df_impAvg <- merge(feature_output_df_full, 
                                all_output_impAvg_ordered_full, 
                                by = "Feature", 
                                all = TRUE)

print(head(feature_output_df_full))
##      Feature LRM XGB ENM RF SVM Total_Count
## 1    age.now   0   1   0  1   1           3
## 2 cg00004073   1   1   1  1   1           5
## 3 cg00084271   1   1   1  1   1           5
## 4 cg00086247   0   1   0  1   0           2
## 5 cg00146240   1   0   1  1   1           4
## 6 cg00154902   1   1   1  1   1           5
print(head(all_output_impAvg_ordered_full))
##      Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1    age.now      0.00416345     0.63497444     0.003392297     0.5289698      0.5000000          0.3343000
## 2 cg00004073      0.26934952     0.25232114     0.368743031     0.3932005      0.3333333          0.3233895
## 3 cg00084271      0.22358222     0.08451443     0.272790932     0.5066986      0.1666667          0.2508506
## 4 cg00086247      0.00000000     0.15625070     0.068094153     0.2757605      0.0000000          0.1000211
## 5 cg00146240      0.08729337     0.00000000     0.195466203     0.5233594      0.1666667          0.1945571
## 6 cg00154902      0.20586269     0.35714894     0.223748437     0.4724526      0.3333333          0.3185092
print(head(all_Output_combined_df_impAvg))
##      Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1    age.now   0   1   0  1   1           3      0.00416345     0.63497444     0.003392297     0.5289698      0.5000000          0.3343000
## 2 cg00004073   1   1   1  1   1           5      0.26934952     0.25232114     0.368743031     0.3932005      0.3333333          0.3233895
## 3 cg00084271   1   1   1  1   1           5      0.22358222     0.08451443     0.272790932     0.5066986      0.1666667          0.2508506
## 4 cg00086247   0   1   0  1   0           2      0.00000000     0.15625070     0.068094153     0.2757605      0.0000000          0.1000211
## 5 cg00146240   1   0   1  1   1           4      0.08729337     0.00000000     0.195466203     0.5233594      0.1666667          0.1945571
## 6 cg00154902   1   1   1  1   1           5      0.20586269     0.35714894     0.223748437     0.4724526      0.3333333          0.3185092
Frequency Feature Selection

choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.

if(METHOD_FEATURE_FLAG == 6){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m6_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m6[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG == 5){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m5_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m5[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG == 4){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m4_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m4[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG==3){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m3_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m3[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
## # A tibble: 6 × 272
##   DX         PC1 cg23432430 cg09727210        PC2 cg00962106 cg07158503 cg06697310 cg02225060 cg09015880 cg10701746 cg16338321 cg26081710 cg00415024 cg21757617 cg14168080 cg02887598 cg05064044
##   <fct>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CI    -0.214        0.948      0.424  0.0147         0.912      0.578      0.845      0.683      0.510      0.480      0.535      0.875      0.430     0.0365      0.419     0.0402     0.567 
## 2 CN    -0.173        0.946      0.881  0.0575         0.538      0.620      0.865      0.827      0.840      0.487      0.829      0.920      0.400     0.443       0.442     0.671      0.536 
## 3 CN    -0.00367      0.942      0.849  0.0837         0.504      0.624      0.241      0.521      0.847      0.493      0.492      0.880      0.747     0.447       0.436     0.734      0.527 
## 4 CI    -0.187        0.943      0.842 -0.0112         0.904      0.599      0.848      0.808      0.487      0.855      0.525      0.915      0.770     0.434       0.957     0.864      0.628 
## 5 CI     0.0268       0.946      0.425  0.0000165      0.896      0.631      0.821      0.608      0.889      0.488      0.842      0.917      0.742     0.747       0.946     0.836      0.566 
## 6 CN    -0.0379       0.951      0.460  0.0157         0.886      0.615      0.784      0.764      0.906      0.842      0.842      0.923      0.761     0.774       0.399     0.412      0.0830
## # ℹ 254 more variables: cg01910713 <dbl>, cg11331837 <dbl>, cg07504457 <dbl>, cg00004073 <dbl>, cg04156077 <dbl>, cg10738648 <dbl>, cg07640670 <dbl>, cg16858433 <dbl>, cg12543766 <dbl>,
## #   cg20685672 <dbl>, cg24851651 <dbl>, cg20678988 <dbl>, cg03088219 <dbl>, cg16536985 <dbl>, cg05234269 <dbl>, cg18285382 <dbl>, cg09216282 <dbl>, cg00084271 <dbl>, cg21697769 <dbl>,
## #   cg15098922 <dbl>, cg27577781 <dbl>, cg18150287 <dbl>, cg08096656 <dbl>, cg19503462 <dbl>, cg07634717 <dbl>, cg26853071 <dbl>, cg09247979 <dbl>, cg00154902 <dbl>, cg15184869 <dbl>,
## #   cg19471911 <dbl>, cg12702014 <dbl>, cg03979311 <dbl>, cg11787167 <dbl>, cg18857647 <dbl>, cg11540596 <dbl>, cg25712921 <dbl>, cg12240569 <dbl>, cg19301366 <dbl>, cg25436480 <dbl>,
## #   cg13387643 <dbl>, cg12421087 <dbl>, cg11227702 <dbl>, cg00648024 <dbl>, cg17002719 <dbl>, cg15633912 <dbl>, cg16715186 <dbl>, cg11019791 <dbl>, cg06880438 <dbl>, cg03660162 <dbl>,
## #   cg01008088 <dbl>, cg15535896 <dbl>, cg15600437 <dbl>, cg02078724 <dbl>, cg20823859 <dbl>, cg13372276 <dbl>, cg25208881 <dbl>, cg26679884 <dbl>, cg01921484 <dbl>, cg06960717 <dbl>,
## #   cg25169289 <dbl>, cg08584917 <dbl>, cg22305850 <dbl>, cg11133939 <dbl>, cg01608425 <dbl>, cg06371647 <dbl>, cg03749159 <dbl>, cg24697433 <dbl>, cg21986118 <dbl>, cg18816397 <dbl>, …
## [1] "The number of final used features of common importance method: 271"
##   [1] "PC1"        "cg23432430" "cg09727210" "PC2"        "cg00962106" "cg07158503" "cg06697310" "cg02225060" "cg09015880" "cg10701746" "cg16338321" "cg26081710" "cg00415024" "cg21757617" "cg14168080"
##  [16] "cg02887598" "cg05064044" "cg01910713" "cg11331837" "cg07504457" "cg00004073" "cg04156077" "cg10738648" "cg07640670" "cg16858433" "cg12543766" "cg20685672" "cg24851651" "cg20678988" "cg03088219"
##  [31] "cg16536985" "cg05234269" "cg18285382" "cg09216282" "cg00084271" "cg21697769" "cg15098922" "cg27577781" "cg18150287" "cg08096656" "cg19503462" "cg07634717" "cg26853071" "cg09247979" "cg00154902"
##  [46] "cg15184869" "cg19471911" "cg12702014" "cg03979311" "cg11787167" "cg18857647" "cg11540596" "cg25712921" "cg12240569" "cg19301366" "cg25436480" "cg13387643" "cg12421087" "cg11227702" "cg00648024"
##  [61] "cg17002719" "cg15633912" "cg16715186" "cg11019791" "cg06880438" "cg03660162" "cg01008088" "cg15535896" "cg15600437" "cg02078724" "cg20823859" "cg13372276" "cg25208881" "cg26679884" "cg01921484"
##  [76] "cg06960717" "cg25169289" "cg08584917" "cg22305850" "cg11133939" "cg01608425" "cg06371647" "cg03749159" "cg24697433" "cg21986118" "cg18816397" "cg01128042" "cg15700429" "cg25277809" "cg22931151"
##  [91] "cg24634455" "cg13405878" "cg02932958" "cg11286989" "cg05593887" "cg18918831" "cg11247378" "cg24139837" "cg17042243" "cg25879395" "cg18029737" "cg10681981" "cg26846609" "cg14293999" "cg10240127"
## [106] "cg08198851" "cg18993517" "cg02823329" "cg08745107" "cg13573375" "cg17738613" "cg02356645" "cg05876883" "cg24883219" "cg00696044" "cg17131279" "cg08041188" "cg24307368" "cg06961873" "cg05392160"
## [121] "cg26983017" "cg07138269" "cg04316537" "cg27224751" "cg04831745" "cg12556569" "cg17386240" "cg04412904" "cg00345083" "cg02668233" "cg10788927" "cg14687298" "cg14170504" "cg03672288" "cg14307563"
## [136] "cg09451339" "cg16431720" "cg01662749" "cg02495179" "cg04768387" "cg17002338" "cg01933473" "cg16089727" "cg24643105" "PC3"        "cg00819121" "cg09120722" "cg27272246" "cg06277607" "cg03982462"
## [151] "cg09584650" "cg08788093" "cg22666875" "cg22542451" "cg00939409" "cg17723206" "cg05321907" "cg12776173" "cg25758034" "cg14710850" "cg23517115" "cg17429539" "cg17906851" "cg00512739" "cg12689021"
## [166] "cg16571124" "cg22071943" "cg25649515" "cg04497611" "cg15730644" "cg13739190" "cg25306893" "cg16779438" "cg06483046" "cg14780448" "cg06833284" "cg14507637" "cg18819889" "cg03549208" "cg15985500"
## [181] "cg05161773" "cg06403901" "cg22169467" "cg08857872" "cg11187460" "cg03600007" "cg05850457" "cg06715136" "cg10091792" "cg03221390" "cg02122327" "cg21139150" "cg14192979" "cg23352245" "cg00146240"
## [196] "cg20981163" "cg27160885" "cg00553601" "cg12146221" "cg13226272" "cg22112152" "cg23836570" "cg08554146" "cg09785377" "cg01462799" "cg06118351" "cg17129965" "cg18339359" "cg11438323" "cg00295418"
## [211] "cg08896901" "cg18526121" "cg02550738" "cg04664583" "cg07028768" "cg01549082" "cg13815695" "cg02627240" "cg19799454" "cg06864789" "cg03737947" "cg14532717" "cg22535849" "cg04718469" "cg14627380"
## [226] "cg10039445" "cg02631626" "cg20673830" "cg17268094" "cg11706829" "cg16733676" "cg20078646" "cg13368637" "cg16652920" "cg26901661" "cg04888234" "cg04242342" "cg00322820" "cg23066280" "cg07480955"
## [241] "cg02772171" "cg21243064" "cg21388339" "cg01153376" "cg15775217" "cg02621446" "cg10666341" "cg23177161" "cg02246922" "cg25174111" "cg00322003" "cg15586958" "cg06231502" "age.now"    "cg18949721"
## [256] "cg12228670" "cg11314779" "cg23916408" "cg01280698" "cg04124201" "cg12784167" "cg04645024" "cg16202259" "cg11268585" "cg15501526" "cg03084184" "cg12333628" "cg21783012" "cg13038195" "cg04867412"
## [271] "cg20803293"
##                     DX          PC1 cg23432430 cg09727210        PC2 cg00962106 cg07158503 cg06697310 cg02225060 cg09015880 cg10701746 cg16338321 cg26081710 cg00415024 cg21757617 cg14168080
## 200223270003_R02C01 CI -0.214185447  0.9482702  0.4240111 0.01470293  0.9124898  0.5777146  0.8454609  0.6828159  0.5101716  0.4795503  0.5350242  0.8751040  0.4299553 0.03652647  0.4190123
## 200223270003_R03C01 CN -0.172761185  0.9455418  0.8812928 0.05745834  0.5375751  0.6203543  0.8653044  0.8265195  0.8402106  0.4868342  0.8294062  0.9198212  0.3999122 0.44299089  0.4420256
## 200223270003_R06C01 CN -0.003667305  0.9418716  0.8493743 0.08372861  0.5040948  0.6236025  0.2405168  0.5209552  0.8472063  0.4927257  0.4918708  0.8801892  0.7465084 0.44725379  0.4355521
##                     cg02887598 cg05064044 cg01910713 cg11331837 cg07504457 cg00004073 cg04156077 cg10738648 cg07640670 cg16858433 cg12543766 cg20685672 cg24851651 cg20678988  cg03088219 cg16536985
## 200223270003_R02C01 0.04020908  0.5672851  0.8573169 0.03692842  0.7116230 0.02928535  0.7321883 0.44931577 0.58296513  0.9184356 0.51028134  0.6712101 0.03674702  0.8438718 0.844002862  0.5789643
## 200223270003_R03C01 0.67073881  0.5358875  0.8538850 0.57150125  0.6854539 0.02787198  0.6865805 0.49894016 0.55225610  0.9194211 0.88741539  0.7932091 0.05358297  0.8548886 0.007435243  0.5418687
## 200223270003_R06C01 0.73408417  0.5273964  0.8110366 0.03182862  0.7205633 0.64576463  0.8501188 0.05552024 0.04058533  0.9271632 0.02818501  0.6613646 0.05968923  0.7786685 0.120155222  0.8392044
##                     cg05234269 cg18285382 cg09216282 cg00084271 cg21697769 cg15098922 cg27577781 cg18150287 cg08096656 cg19503462 cg07634717 cg26853071 cg09247979 cg00154902 cg15184869 cg19471911
## 200223270003_R02C01 0.93848584  0.3202927  0.9349248  0.8103611  0.8946108  0.9286092  0.8143535  0.7685695  0.9362594  0.7951675  0.7483382  0.4233820  0.5070956  0.5137741  0.8622328  0.6334393
## 200223270003_R03C01 0.57461229  0.2930577  0.9244259  0.7877006  0.2822953  0.9027517  0.8113185  0.7519166  0.9314878  0.4537684  0.8254434  0.7451354  0.5706177  0.8540746  0.8996252  0.8437175
## 200223270003_R06C01 0.02467208  0.8923595  0.9263996  0.7706165  0.8698740  0.8525611  0.8144274  0.2501173  0.4943033  0.6997359  0.8181246  0.4228079  0.5090215  0.8188126  0.8688117  0.6127952
##                     cg12702014 cg03979311 cg11787167 cg18857647 cg11540596 cg25712921 cg12240569 cg19301366 cg25436480 cg13387643 cg12421087 cg11227702 cg00648024 cg17002719 cg15633912 cg16715186
## 200223270003_R02C01  0.7704049 0.86644909 0.03853894  0.8582332  0.9238951  0.2829848 0.82772064  0.8831393  0.8425160  0.4229959  0.5647607 0.86486075 0.51410972 0.04939181  0.1605530  0.2742789
## 200223270003_R03C01  0.7848681 0.06199853 0.04673831  0.8394132  0.8926595  0.6220919 0.02690547  0.8072679  0.4994032  0.4200273  0.5399655 0.49184121 0.40202875 0.40466475  0.9333421  0.7946153
## 200223270003_R06C01  0.8065993 0.72615553 0.32564508  0.2647491  0.8820252  0.6384003 0.46030640  0.8796022  0.3494312  0.4161488  0.5400348 0.02543724 0.05579011 0.51428089  0.8737362  0.8124316
##                     cg11019791 cg06880438 cg03660162 cg01008088 cg15535896 cg15600437 cg02078724 cg20823859 cg13372276 cg25208881 cg26679884 cg01921484 cg06960717 cg25169289 cg08584917 cg22305850
## 200223270003_R02C01  0.8112324  0.8285145  0.8691767  0.8424817  0.3382952  0.4885353  0.3096774  0.9030711 0.04888111  0.1851956  0.6793815  0.9098550  0.7030978  0.1100884  0.5663205 0.03361934
## 200223270003_R03C01  0.7831231  0.7988881  0.5160770  0.2417656  0.9253926  0.4894487  0.2896133  0.6062985 0.62396373  0.9092286  0.1848705  0.9093137  0.7653402  0.7667174  0.9019732 0.57522232
## 200223270003_R06C01  0.4353250  0.7839538  0.9026304  0.2618620  0.3320191  0.8551374  0.2805612  0.8917348 0.59693465  0.9265502  0.1701734  0.9204487  0.7206218  0.2264993  0.9187789 0.58548744
##                     cg11133939 cg01608425 cg06371647 cg03749159 cg24697433 cg21986118 cg18816397 cg01128042 cg15700429 cg25277809 cg22931151 cg24634455 cg13405878 cg02932958 cg11286989 cg05593887
## 200223270003_R02C01  0.1282694  0.9030410  0.8336894  0.9355921  0.9243095  0.6658175  0.5472925  0.9113420  0.7879010  0.1632342  0.9311023  0.7796391  0.4549662  0.7901008  0.7590008  0.5939220
## 200223270003_R03C01  0.5920898  0.9264388  0.8198684  0.9153921  0.6808390  0.6571296  0.4940355  0.5328806  0.9114530  0.4913711  0.9356702  0.5188241  0.7858042  0.4210489  0.8533989  0.5766550
## 200223270003_R06C01  0.5127706  0.8887753  0.8069537  0.9255807  0.6384606  0.7034445  0.5337018  0.5222757  0.8838233  0.5952124  0.9328614  0.5325725  0.7583938  0.3825995  0.7313884  0.9148338
##                     cg18918831 cg11247378 cg24139837 cg17042243 cg25879395 cg18029737 cg10681981 cg26846609 cg14293999 cg10240127 cg08198851 cg18993517 cg02823329 cg08745107 cg13573375 cg17738613
## 200223270003_R02C01  0.4891660  0.1591185 0.07404605  0.2502905 0.88130864  0.9100454  0.7035090 0.48860949  0.2836710  0.9250553  0.6578905  0.2091538  0.9462397 0.02921338  0.8670419  0.6879612
## 200223270003_R03C01  0.5333801  0.7874849 0.04183445  0.2933475 0.02603438  0.9016634  0.7382662 0.04878986  0.9172023  0.9403255  0.6578186  0.2665896  0.6464005 0.78542320  0.1733934  0.6582258
## 200223270003_R06C01  0.6406575  0.4807942 0.05657120  0.2725457 0.91060615  0.7376586  0.6971989 0.48026945  0.9168166  0.9056974  0.1272153  0.2574003  0.9633930 0.02709928  0.8888246  0.1022257
##                     cg02356645 cg05876883 cg24883219 cg00696044 cg17131279 cg08041188 cg24307368 cg06961873 cg05392160 cg26983017 cg07138269 cg04316537 cg27224751 cg04831745 cg12556569 cg17386240
## 200223270003_R02C01  0.5105903  0.9039064  0.6430473 0.55608424  0.1900637  0.7752456 0.64323677  0.5335591  0.9328933 0.89868232  0.5002290  0.8074830 0.44503947 0.61984995 0.06218231  0.7473400
## 200223270003_R03C01  0.5833923  0.9223308  0.6822115 0.07552381  0.7048637  0.3201255 0.34980461  0.5472606  0.2576881 0.03145466  0.9426707  0.8453340 0.03214912 0.71214149 0.03924599  0.7144809
## 200223270003_R06C01  0.5701428  0.4697980  0.5296903 0.79270858  0.1492861  0.7900939 0.02720398  0.9415177  0.8920726 0.84677625  0.5057781  0.4351695 0.83123722 0.06871768 0.48636893  0.8074824
##                     cg04412904 cg00345083 cg02668233 cg10788927 cg14687298 cg14170504 cg03672288 cg14307563 cg09451339 cg16431720 cg01662749 cg02495179 cg04768387 cg17002338 cg01933473 cg16089727
## 200223270003_R02C01 0.05088595 0.47960968  0.4708431  0.8973154 0.04206702 0.54915621  0.9235592  0.1855966  0.2243746  0.7356099  0.3506201  0.6813307  0.3131047  0.9286251  0.2589014 0.86748697
## 200223270003_R03C01 0.07717659 0.50833875  0.8841930  0.2021398 0.14813581 0.02236650  0.6718625  0.8916957  0.2340702  0.8692449  0.2510946  0.7373055  0.9465814  0.2684163  0.6726133 0.54996692
## 200223270003_R06C01 0.08253743 0.03929249  0.4575646  0.2053075 0.24260002 0.02988245  0.9007629  0.8750052  0.8921284  0.8773137  0.8061480  0.5588114  0.9098563  0.2811103  0.2642560 0.05876736
##                     cg24643105          PC3 cg00819121 cg09120722 cg27272246 cg06277607 cg03982462 cg09584650 cg08788093 cg22666875 cg22542451 cg00939409 cg17723206 cg05321907 cg12776173 cg25758034
## 200223270003_R02C01  0.5303418 -0.014043316  0.9207001  0.5878977  0.8615873 0.10744587  0.8562777 0.08230254 0.03911678  0.8177182  0.5884356  0.2652180 0.92881042  0.2880477  0.1038804  0.6114028
## 200223270003_R03C01  0.5042688  0.005055871  0.9281472  0.8287506  0.8705287 0.09353494  0.6023731 0.09661586 0.60934160  0.8291957  0.8337068  0.8882671 0.48556255  0.1782629  0.8730635  0.6649219
## 200223270003_R06C01  0.9383050  0.029143653  0.9327211  0.8793344  0.8103777 0.09504696  0.8778458 0.52399749 0.88380243  0.3694180  0.8125084  0.8842646 0.01765023  0.8427929  0.7009491  0.2393844
##                     cg14710850 cg23517115 cg17429539 cg17906851 cg00512739 cg12689021 cg16571124 cg22071943 cg25649515 cg04497611 cg15730644 cg13739190 cg25306893 cg16779438 cg06483046 cg14780448
## 200223270003_R02C01  0.8048592  0.2151144  0.7860900  0.9488392  0.9337648  0.7706828  0.9282854  0.8705217  0.9279829  0.9086359  0.4803181  0.8510103  0.6265392  0.8826150 0.04383925  0.9119141
## 200223270003_R03C01  0.8090950  0.9131440  0.7100923  0.9529718  0.8863895  0.7449475  0.9206431  0.2442648  0.9235753  0.8818513  0.4353906  0.8358482  0.8330282  0.5466924 0.50720277  0.6702102
## 200223270003_R06C01  0.8285902  0.8328364  0.7660838  0.6462151  0.9242748  0.7872237  0.9276842  0.2644581  0.5895839  0.5853116  0.8763048  0.8419471  0.6175380  0.8629492 0.89604910  0.6207355
##                     cg06833284 cg14507637 cg18819889 cg03549208 cg15985500 cg05161773 cg06403901 cg22169467 cg08857872 cg11187460 cg03600007 cg05850457 cg06715136 cg10091792 cg03221390 cg02122327
## 200223270003_R02C01  0.9125144  0.9051258  0.9156157  0.9014487  0.8555262  0.4120912 0.92790690  0.3095010  0.3395280 0.03672179  0.5658487  0.8183013  0.3400192  0.8670733  0.5859063 0.38940091
## 200223270003_R03C01  0.9003482  0.9009460  0.9004455  0.8381784  0.8312198  0.4154907 0.04783341  0.2978585  0.8181845 0.92516409  0.6018832  0.8313023  0.9259109  0.5864221  0.9180706 0.37769608
## 200223270003_R06C01  0.6097933  0.9013686  0.9054439  0.9097817  0.8492103  0.8526849 0.05253626  0.8955853  0.2970779 0.03109553  0.8611166  0.8161364  0.9079807  0.6087997  0.6399867 0.04017909
##                     cg21139150 cg14192979 cg23352245 cg00146240 cg20981163 cg27160885 cg00553601 cg12146221 cg13226272 cg22112152 cg23836570 cg08554146 cg09785377 cg01462799 cg06118351 cg17129965
## 200223270003_R02C01 0.01853264 0.06336040  0.9377232  0.6336151  0.8990628  0.2231606 0.05601299  0.2049284 0.02637249  0.8476101 0.58688450  0.8982080  0.9162088  0.8284427  0.3633940  0.8972140
## 200223270003_R03C01 0.43223243 0.06019651  0.9375774  0.8957183  0.9264076  0.8263885 0.58957701  0.1814927 0.54100016  0.8014136 0.54259383  0.8963074  0.9226292  0.4038824  0.4714860  0.8806673
## 200223270003_R06C01 0.43772680 0.52114282  0.5932742  0.1433218  0.4874651  0.2121179 0.62426500  0.8619250 0.44370701  0.7897897 0.03267304  0.8213878  0.6405193  0.4676821  0.8655962  0.8857237
##                     cg18339359 cg11438323 cg00295418 cg08896901 cg18526121 cg02550738 cg04664583 cg07028768 cg01549082 cg13815695 cg02627240 cg19799454 cg06864789 cg03737947 cg14532717 cg22535849
## 200223270003_R02C01  0.8824858  0.4863471 0.44954665  0.3581911  0.4519781  0.6201457  0.5572814  0.4496851  0.2924138  0.9267057 0.66706843  0.9178930 0.05369415 0.91824910  0.5732280  0.8847704
## 200223270003_R03C01  0.9040272  0.8984559 0.48471295  0.2467071  0.4762313  0.9011727  0.5881190  0.8536078  0.7065693  0.6859729 0.57129408  0.9106247 0.46053125 0.92067153  0.1107638  0.8609966
## 200223270003_R06C01  0.8552121  0.8722772 0.02004532  0.9225209  0.4833367  0.9085849  0.9352717  0.8356936  0.2895440  0.6509046 0.05309659  0.9066551 0.87513655 0.03638091  0.6273416  0.8808022
##                     cg04718469 cg14627380 cg10039445 cg02631626 cg20673830 cg17268094 cg11706829 cg16733676 cg20078646 cg13368637 cg16652920 cg26901661 cg04888234 cg04242342 cg00322820 cg23066280
## 200223270003_R02C01  0.8687522  0.9455369  0.8833873  0.6280766  0.2422052  0.5774753  0.8897234  0.9057228 0.06198170  0.5597507  0.9436000  0.8951971  0.8379655  0.8206769  0.4869764 0.07247841
## 200223270003_R03C01  0.7256813  0.9258964  0.8954055  0.1951736  0.6881735  0.9003262  0.5444785  0.8904541 0.89537412  0.9100088  0.9431222  0.8754981  0.4376314  0.8167892  0.4858988 0.57174588
## 200223270003_R06C01  0.8521881  0.5789898  0.8832807  0.2699849  0.2134634  0.8789368  0.5669449  0.1698111 0.08725521  0.8739205  0.9457161  0.9021064  0.8039047  0.8040357  0.4754313 0.80814756
##                     cg07480955 cg02772171 cg21243064 cg21388339 cg01153376 cg15775217 cg02621446 cg10666341 cg23177161 cg02246922 cg25174111 cg00322003 cg15586958 cg06231502 age.now cg18949721
## 200223270003_R02C01  0.3874638  0.9182018  0.5191606  0.2756268  0.4872148  0.5707441  0.8731313  0.9046648  0.4151698  0.7301201  0.8526503  0.1759911  0.9058263  0.7784451    82.4  0.2334245
## 200223270003_R03C01  0.3916889  0.5660559  0.9167649  0.2102269  0.9639670  0.9168327  0.8095534  0.6731062  0.4586576  0.9447019  0.8573844  0.5702070  0.8957526  0.7964278    78.6  0.2437792
## 200223270003_R06C01  0.4043390  0.8995479  0.4862205  0.7649181  0.2242410  0.6042521  0.7511582  0.6443180  0.8287312  0.7202230  0.2567745  0.3077122  0.9121763  0.7706160    80.4  0.2523095
##                     cg12228670 cg11314779 cg23916408 cg01280698 cg04124201 cg12784167 cg04645024 cg16202259 cg11268585 cg15501526 cg03084184 cg12333628 cg21783012 cg13038195 cg04867412 cg20803293
## 200223270003_R02C01  0.8632174  0.0242134  0.1942275  0.8985067  0.8686421 0.81503498  0.7366541  0.9548726  0.2521544  0.6362531  0.8162981  0.9227884  0.9142369 0.45882213 0.04304823 0.54933918
## 200223270003_R03C01  0.8496212  0.8966100  0.9154993  0.8846201  0.3308589 0.02811410  0.8454827  0.3713483  0.8535791  0.6319253  0.7877128  0.9092861  0.6694884 0.02740132 0.87967997 0.07935747
## 200223270003_R06C01  0.8738949  0.8908661  0.8886255  0.8847132  0.3241613 0.03073269  0.0871902  0.4852461  0.9121931  0.7435100  0.4546397  0.5084647  0.9070112 0.46284376 0.44971146 0.42191244
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
if(METHOD_FEATURE_FLAG==1){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])

df_process_Output_freq<-processed_data_m1_df[,c("DX",df_process_frequency_FeatureName)]

output_Frequency_Feature <- processed_data_m1[,c("DX",df_process_frequency_FeatureName)]

print(head(output_Frequency_Feature))

print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))

print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
print(df_process_frequency_FeatureName)
##   [1] "PC1"        "cg23432430" "cg09727210" "PC2"        "cg00962106" "cg07158503" "cg06697310" "cg02225060" "cg09015880" "cg10701746" "cg16338321" "cg26081710" "cg00415024" "cg21757617" "cg14168080"
##  [16] "cg02887598" "cg05064044" "cg01910713" "cg11331837" "cg07504457" "cg00004073" "cg04156077" "cg10738648" "cg07640670" "cg16858433" "cg12543766" "cg20685672" "cg24851651" "cg20678988" "cg03088219"
##  [31] "cg16536985" "cg05234269" "cg18285382" "cg09216282" "cg00084271" "cg21697769" "cg15098922" "cg27577781" "cg18150287" "cg08096656" "cg19503462" "cg07634717" "cg26853071" "cg09247979" "cg00154902"
##  [46] "cg15184869" "cg19471911" "cg12702014" "cg03979311" "cg11787167" "cg18857647" "cg11540596" "cg25712921" "cg12240569" "cg19301366" "cg25436480" "cg13387643" "cg12421087" "cg11227702" "cg00648024"
##  [61] "cg17002719" "cg15633912" "cg16715186" "cg11019791" "cg06880438" "cg03660162" "cg01008088" "cg15535896" "cg15600437" "cg02078724" "cg20823859" "cg13372276" "cg25208881" "cg26679884" "cg01921484"
##  [76] "cg06960717" "cg25169289" "cg08584917" "cg22305850" "cg11133939" "cg01608425" "cg06371647" "cg03749159" "cg24697433" "cg21986118" "cg18816397" "cg01128042" "cg15700429" "cg25277809" "cg22931151"
##  [91] "cg24634455" "cg13405878" "cg02932958" "cg11286989" "cg05593887" "cg18918831" "cg11247378" "cg24139837" "cg17042243" "cg25879395" "cg18029737" "cg10681981" "cg26846609" "cg14293999" "cg10240127"
## [106] "cg08198851" "cg18993517" "cg02823329" "cg08745107" "cg13573375" "cg17738613" "cg02356645" "cg05876883" "cg24883219" "cg00696044" "cg17131279" "cg08041188" "cg24307368" "cg06961873" "cg05392160"
## [121] "cg26983017" "cg07138269" "cg04316537" "cg27224751" "cg04831745" "cg12556569" "cg17386240" "cg04412904" "cg00345083" "cg02668233" "cg10788927" "cg14687298" "cg14170504" "cg03672288" "cg14307563"
## [136] "cg09451339" "cg16431720" "cg01662749" "cg02495179" "cg04768387" "cg17002338" "cg01933473" "cg16089727" "cg24643105" "PC3"        "cg00819121" "cg09120722" "cg27272246" "cg06277607" "cg03982462"
## [151] "cg09584650" "cg08788093" "cg22666875" "cg22542451" "cg00939409" "cg17723206" "cg05321907" "cg12776173" "cg25758034" "cg14710850" "cg23517115" "cg17429539" "cg17906851" "cg00512739" "cg12689021"
## [166] "cg16571124" "cg22071943" "cg25649515" "cg04497611" "cg15730644" "cg13739190" "cg25306893" "cg16779438" "cg06483046" "cg14780448" "cg06833284" "cg14507637" "cg18819889" "cg03549208" "cg15985500"
## [181] "cg05161773" "cg06403901" "cg22169467" "cg08857872" "cg11187460" "cg03600007" "cg05850457" "cg06715136" "cg10091792" "cg03221390" "cg02122327" "cg21139150" "cg14192979" "cg23352245" "cg00146240"
## [196] "cg20981163" "cg27160885" "cg00553601" "cg12146221" "cg13226272" "cg22112152" "cg23836570" "cg08554146" "cg09785377" "cg01462799" "cg06118351" "cg17129965" "cg18339359" "cg11438323" "cg00295418"
## [211] "cg08896901" "cg18526121" "cg02550738" "cg04664583" "cg07028768" "cg01549082" "cg13815695" "cg02627240" "cg19799454" "cg06864789" "cg03737947" "cg14532717" "cg22535849" "cg04718469" "cg14627380"
## [226] "cg10039445" "cg02631626" "cg20673830" "cg17268094" "cg11706829" "cg16733676" "cg20078646" "cg13368637" "cg16652920" "cg26901661" "cg04888234" "cg04242342" "cg00322820" "cg23066280" "cg07480955"
## [241] "cg02772171" "cg21243064" "cg21388339" "cg01153376" "cg15775217" "cg02621446" "cg10666341" "cg23177161" "cg02246922" "cg25174111" "cg00322003" "cg15586958" "cg06231502" "age.now"    "cg18949721"
## [256] "cg12228670" "cg11314779" "cg23916408" "cg01280698" "cg04124201" "cg12784167" "cg04645024" "cg16202259" "cg11268585" "cg15501526" "cg03084184" "cg12333628" "cg21783012" "cg13038195" "cg04867412"
## [271] "cg20803293"
Importance of these features:
Selected_Frequency_Feature_importance <-all_Output_combined_df_impAvg[all_Output_combined_df_impAvg$Total_Count>=3,]
print(Selected_Frequency_Feature_importance)
##       Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1     age.now   0   1   0  1   1           3     0.004163450    0.634974445     0.003392297     0.5289698      0.5000000          0.3343000
## 2  cg00004073   1   1   1  1   1           5     0.269349519    0.252321139     0.368743031     0.3932005      0.3333333          0.3233895
## 3  cg00084271   1   1   1  1   1           5     0.223582217    0.084514434     0.272790932     0.5066986      0.1666667          0.2508506
## 5  cg00146240   1   0   1  1   1           4     0.087293373    0.000000000     0.195466203     0.5233594      0.1666667          0.1945571
## 6  cg00154902   1   1   1  1   1           5     0.205862688    0.357148936     0.223748437     0.4724526      0.3333333          0.3185092
## 7  cg00295418   1   0   1  1   1           4     0.052680354    0.000000000     0.126220928     0.2779946      0.3333333          0.1580459
## 8  cg00322003   1   0   1  0   1           3     0.026025136    0.000000000     0.170919274     0.1854808      0.5000000          0.1764850
## 9  cg00322820   1   0   1  0   1           3     0.082466488    0.000000000     0.112276903     0.1971691      0.3333333          0.1450492
## 10 cg00345083   1   1   1  1   1           5     0.067531780    0.007214598     0.125533467     0.3843604      0.1666667          0.1502614
## 11 cg00415024   1   1   1  1   1           5     0.312750147    0.191638841     0.368625491     0.6225654      0.1666667          0.3324493
## 12 cg00512739   1   0   1  1   1           4     0.182276652    0.000000000     0.247625185     0.2915734      0.3333333          0.2109617
## 13 cg00553601   1   1   1  1   0           4     0.082411954    0.047507352     0.118912401     0.2530610      0.0000000          0.1003786
## 14 cg00648024   1   1   1  1   1           5     0.188256686    0.138389121     0.282665821     0.3302662      0.3333333          0.2545822
## 15 cg00696044   1   1   1  1   1           5     0.086245021    0.155116397     0.210684436     0.4958086      0.1666667          0.2229042
## 17 cg00819121   1   0   1  1   1           4     0.324605878    0.000000000     0.375766891     0.2925906      0.1666667          0.2319260
## 18 cg00939409   1   1   1  0   1           4     0.215333810    0.096720545     0.280167155     0.2332442      0.1666667          0.1984265
## 19 cg00962106   1   1   1  1   1           5     0.411641806    0.324339586     0.520231601     0.2711677      0.8333333          0.4721428
## 20 cg01008088   1   1   1  1   1           5     0.176892321    0.334393663     0.213601915     0.6712800      0.1666667          0.3125669
## 21 cg01128042   1   1   1  1   1           5     0.130128410    0.305711493     0.206130764     0.5172017      0.3333333          0.2985011
## 22 cg01153376   1   1   0  1   0           3     0.054613673    0.064932923     0.093178458     0.2663037      0.1666667          0.1291391
## 23 cg01280698   0   1   0  1   1           3     0.000000000    0.171928832     0.068309707     0.3119038      0.6666667          0.2437618
## 24 cg01462799   1   1   1  1   0           4     0.067347244    0.017976135     0.112488167     0.2345782      0.1666667          0.1198113
## 25 cg01549082   1   1   1  1   0           4     0.036089818    0.289658654     0.116966623     0.4562195      0.0000000          0.1797869
## 26 cg01608425   1   1   1  1   1           5     0.143336624    0.224704929     0.176798440     0.2616919      0.3333333          0.2279731
## 27 cg01662749   1   1   1  1   1           5     0.046585538    0.003576880     0.169671947     0.4083134      0.3333333          0.1922962
## 28 cg01910713   1   1   1  1   1           5     0.288793899    0.036071458     0.346939108     0.2852783      0.3333333          0.2580832
## 29 cg01921484   1   1   1  1   1           5     0.151192434    0.022248999     0.314260237     0.4975570      0.3333333          0.2637184
## 30 cg01933473   1   1   1  1   1           5     0.032966179    0.188410259     0.131296579     0.3895250      0.3333333          0.2151063
## 31 cg02078724   1   1   1  1   1           5     0.170470017    0.100360545     0.230071851     0.4201033      0.5000000          0.2842011
## 32 cg02122327   1   1   1  1   0           4     0.101980457    0.405336662     0.186377937     0.3776111      0.0000000          0.2142612
## 33 cg02225060   1   1   1  1   1           5     0.354934811    0.014823116     0.478455851     0.5342408      0.1666667          0.3098243
## 34 cg02246922   1   0   0  1   1           3     0.030685944    0.000000000     0.101537140     0.3343261      0.3333333          0.1599765
## 35 cg02356645   1   1   1  1   1           5     0.089543386    0.067243906     0.111478962     0.2928698      0.5000000          0.2122272
## 37 cg02495179   1   1   1  1   1           5     0.041302003    0.182002026     0.131585203     0.3799953      0.3333333          0.2136436
## 38 cg02550738   1   1   1  0   1           4     0.042141616    0.020463221     0.118472357     0.1533466      0.3333333          0.1335514
## 39 cg02621446   1   1   0  1   0           3     0.049579364    0.031024055     0.094051394     0.2771201      0.1666667          0.1236883
## 40 cg02627240   1   1   0  1   1           4     0.025034816    0.291605934     0.085034592     0.3097444      0.1666667          0.1756173
## 41 cg02631626   0   1   1  1   1           4     0.000000000    0.018210136     0.164542120     0.2789742      0.3333333          0.1590120
## 43 cg02668233   1   1   1  1   1           5     0.065997478    0.031214253     0.171125735     0.4860534      0.3333333          0.2175448
## 44 cg02772171   1   1   1  0   0           3     0.074944907    0.008455338     0.136723512     0.1987788      0.1666667          0.1171138
## 45 cg02823329   1   1   1  1   1           5     0.095665171    0.066658022     0.237420190     0.3912554      0.3333333          0.2248664
## 46 cg02887598   1   1   1  1   1           5     0.298943556    0.068421117     0.373777071     0.4329452      0.3333333          0.3014841
## 47 cg02932958   1   1   1  1   1           5     0.119976765    0.023889341     0.208520484     0.5167944      0.5000000          0.2738362
## 48 cg03084184   0   1   0  1   1           3     0.000000000    0.044329829     0.015641222     0.4458309      0.5000000          0.2011604
## 49 cg03088219   1   1   1  1   1           5     0.242526726    0.386706306     0.326265900     0.3305586      0.1666667          0.2905448
## 50 cg03221390   1   1   1  0   1           4     0.107855280    0.094349016     0.133511928     0.2168214      0.1666667          0.1438409
## 53 cg03549208   1   1   1  0   1           4     0.144872243    0.013354417     0.184090016     0.1924605      0.3333333          0.1736221
## 54 cg03600007   1   1   1  0   1           4     0.124581642    0.133851634     0.249180824     0.1581174      0.6666667          0.2664796
## 55 cg03660162   1   1   1  1   1           5     0.178105731    0.520174028     0.357981700     0.3014357      0.5000000          0.3715394
## 56 cg03672288   1   1   1  1   1           5     0.057943034    0.110735242     0.208194157     0.4151766      0.1666667          0.1917431
## 57 cg03737947   0   1   1  1   1           4     0.016829061    0.272380669     0.137723494     0.5336092      0.1666667          0.2254418
## 58 cg03749159   1   1   1  1   1           5     0.136655025    0.327177056     0.189032603     0.7288038      0.3333333          0.3430004
## 59 cg03979311   1   1   1  1   1           5     0.197215429    0.040125753     0.316823108     0.2858039      0.3333333          0.2346603
## 60 cg03982462   1   0   1  1   1           4     0.260179978    0.000000000     0.364985173     0.4805497      0.1666667          0.2544763
## 62 cg04124201   0   1   0  1   1           3     0.000000000    0.161334846     0.047890497     0.2991655      0.3333333          0.1683448
## 63 cg04156077   1   1   1  1   1           5     0.267989586    0.278838990     0.289741411     0.3698890      0.5000000          0.3412918
## 64 cg04242342   1   0   1  0   1           3     0.083050748    0.000000000     0.146924727     0.1305678      0.3333333          0.1387753
## 65 cg04316537   1   1   1  1   1           5     0.076660051    0.072015864     0.197150638     0.5322393      0.1666667          0.2089465
## 66 cg04412904   1   1   1  1   1           5     0.067907761    0.008001825     0.153993774     0.3323540      0.3333333          0.1791181
## 68 cg04497611   1   1   1  0   1           4     0.169004430    0.168998579     0.245534890     0.1676654      0.1666667          0.1835740
## 70 cg04645024   0   1   0  1   1           3     0.000000000    0.089605477     0.035342163     0.4314151      0.1666667          0.1446059
## 71 cg04664583   1   0   1  1   1           4     0.039767751    0.000000000     0.132274360     0.3593538      0.1666667          0.1396125
## 72 cg04718469   0   1   1  1   1           4     0.000000000    0.073525158     0.120957256     0.3833990      0.5000000          0.2155763
## 73 cg04768387   1   1   1  1   1           5     0.041259130    0.007136179     0.110328828     0.6617940      0.3333333          0.2307703
## 74 cg04831745   1   1   1  1   1           5     0.072362310    0.163128841     0.170924298     0.4914946      0.3333333          0.2462487
## 75 cg04867412   0   1   0  1   1           3     0.009952011    0.006644894     0.051653712     0.3521789      0.3333333          0.1507526
## 76 cg04888234   1   1   1  0   0           3     0.085173667    0.011359898     0.158877410     0.2305396      0.1666667          0.1305235
## 77 cg05064044   1   1   1  1   1           5     0.298161376    0.083498934     0.370533860     0.4486357      0.1666667          0.2734993
## 79 cg05161773   1   1   1  0   1           4     0.134312114    0.005676604     0.243185340     0.1921429      0.3333333          0.1817300
## 80 cg05234269   1   1   1  1   1           5     0.236394701    0.117738693     0.303552770     0.3708301      0.3333333          0.2723699
## 81 cg05321907   1   0   1  1   1           4     0.211489718    0.000000000     0.232925760     0.3334943      0.1666667          0.1889153
## 82 cg05392160   1   1   1  1   1           5     0.081302245    0.165992808     0.126803158     0.2377531      0.3333333          0.1890369
## 83 cg05593887   1   1   1  1   1           5     0.109637113    0.008367983     0.222451327     0.2341437      0.3333333          0.1815867
## 84 cg05850457   1   1   0  1   1           4     0.118261095    0.028639870     0.050214597     0.4977309      0.1666667          0.1723026
## 85 cg05876883   1   1   1  1   1           5     0.088595752    0.006101381     0.238365969     0.2767240      0.3333333          0.1886241
## 88 cg06118351   1   0   1  1   1           4     0.062151507    0.000000000     0.178593029     0.4821947      0.1666667          0.1779212
##  [ reached 'max' / getOption("max.print") -- omitted 195 rows ]

8.2 Output - Write Files

Data Frame with selected features

# Output data frame with selected features based on mean method:  
# "selected_impAvg_ordered_NAME", This data frame don't have column named "SampleID"

if(Flag_8mean){

filename_mean <- paste0("Selected_mean", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_mean <- paste0(OUTUT_CSV_PATHNAME, filename_mean)
  if (file.exists(OUTPUTPATH_mean)) {
      print("selected file based on frequency already exists")} 
  else {
      write.csv(df_selected_Mean, 
            file = OUTPUTPATH_mean, 
            row.names = FALSE)
  }



}
if(Flag_8median){
  filename_median <- paste0("Selected_median", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
  OUTPUTPATH_median <- paste0(OUTUT_CSV_PATHNAME, filename_median)

  if (file.exists(OUTPUTPATH_median)) {
      print("selected file based on frequency already exists")} 
  else {
      write.csv(df_selected_Median, 
            file = OUTPUTPATH_median, 
            row.names = FALSE)
  }
}
if(Flag_8Fequency){
   filename_frequency <- paste0("Selected_frequency", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
  OUTPUTPATH_frequency <- paste0(OUTUT_CSV_PATHNAME, filename_frequency)
    if (file.exists(OUTPUTPATH_frequency)) {
      print("selected file based on frequency already exists")} 
    else {
      write.csv(df_process_Output_freq, 
                file = OUTPUTPATH_frequency, 
                row.names = FALSE)
  }
}

Phenotype Data Frame

# This is the flag of phenotype data output, 
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".

phenotypeDF<-merged_df_raw[,colnames(phenoticPart_RAW)]
print(head(phenotypeDF))
##                                barcodes RID.a     prop.B    prop.NK   prop.CD4T  prop.CD8T  prop.Mono prop.Neutro prop.Eosino       DX  age.now PTGENDER  ABETA   TAU  PTAU          PC1           PC2
## 200223270003_R02C01 200223270003_R02C01  2190 0.03164651 0.03609239 0.010771839 0.01481567 0.06533409   0.8413395           0      MCI 82.40000     Male  963.2 341.5 35.48 -0.214185447  1.470293e-02
## 200223270003_R03C01 200223270003_R03C01  4080 0.03556363 0.04697771 0.002321312 0.06381941 0.04901806   0.8022999           0       CN 78.60000   Female  950.6 295.9 28.08 -0.172761185  5.745834e-02
## 200223270003_R06C01 200223270003_R06C01  4505 0.07129589 0.04412218 0.037684081 0.11457236 0.08745402   0.6448715           0       CN 80.40000   Female 1705.0 353.2 28.49 -0.003667305  8.372861e-02
## 200223270003_R07C01 200223270003_R07C01  1010 0.02081699 0.07117668 0.040966085 0.00000000 0.04459325   0.8224470           0 Dementia 78.16441     Male  493.3 272.8 22.75 -0.186779607 -1.117250e-02
## 200223270006_R01C01 200223270006_R01C01  4226 0.02680465 0.04767947 0.128514873 0.09085886 0.07419209   0.6319501           0      MCI 62.90000   Female 1705.0 253.1 22.84  0.026814649  1.650735e-05
## 200223270006_R04C01 200223270006_R04C01  1190 0.07063013 0.05250647 0.064529118 0.04309168 0.08796080   0.6812818           0       CN 80.67796   Female 1336.0 439.3 40.78 -0.037862929  1.571950e-02
##                              PC3   ageGroup ageGroupsq DX_num uniqueID  Horvath
## 200223270003_R02C01 -0.014043316  0.6606949 0.43651772      0        1 61.50365
## 200223270003_R03C01  0.005055871  0.2806949 0.07878961      0        1 69.26678
## 200223270003_R06C01  0.029143653  0.4606949 0.21223977      0        1 96.84418
## 200223270003_R07C01 -0.032302430  0.2371357 0.05623333      1        1 61.76446
## 200223270006_R01C01  0.052947950 -1.2893051 1.66230770      0        1 59.33885
## 200223270006_R04C01 -0.008685676  0.4884909 0.23862336      0        1 70.27197
OUTPUTPATH_phenotypePart <- paste0(OUTUT_CSV_PATHNAME, "PhenotypePart_df.csv")

if(phenoOutPUt_FLAG ){
  if (file.exists(OUTPUTPATH_phenotypePart)) {
  print("Phenotype File already exists")} 
  else {
  write.csv(phenotypeDF, file = OUTPUTPATH_phenotypePart, row.names = FALSE)
  }
}
## [1] "Phenotype File already exists"

9. Selected Feature Performance

9.1 Selected Based on Mean

9.1.1 Input Feature For Evaluation

Performance of the selected output features based on Mean

processed_dataFrame<-df_selected_Mean
processed_data<-output_mean_process

AfterProcess_FeatureName<-selected_impAvg_ordered_NAME
print(head(output_mean_process))
## # A tibble: 6 × 251
##   DX    cg23432430      PC3       PC2 cg00962106      PC1 cg07158503 cg06697310 cg11331837 cg07634717 cg03660162 cg24851651 cg11019791 cg20685672 cg26081710 cg14168080 cg03749159 cg20678988 cg04156077
##   <fct>      <dbl>    <dbl>     <dbl>      <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CI         0.948 -0.0140    1.47e-2      0.912 -0.214        0.578      0.845     0.0369      0.748      0.869     0.0367      0.811     0.671       0.875      0.419      0.936      0.844      0.732
## 2 CN         0.946  0.00506   5.75e-2      0.538 -0.173        0.620      0.865     0.572       0.825      0.516     0.0536      0.783     0.793       0.920      0.442      0.915      0.855      0.687
## 3 CN         0.942  0.0291    8.37e-2      0.504 -0.00367      0.624      0.241     0.0318      0.818      0.903     0.0597      0.435     0.661       0.880      0.436      0.926      0.779      0.850
## 4 CI         0.943 -0.0323   -1.12e-2      0.904 -0.187        0.599      0.848     0.0383      0.758      0.531     0.609       0.850     0.808       0.915      0.957      0.629      0.826      0.680
## 5 CI         0.946  0.0529    1.65e-5      0.896  0.0268       0.631      0.821     0.930       0.826      0.926     0.0883      0.854     0.0829      0.917      0.946      0.929      0.330      0.891
## 6 CN         0.951 -0.00869   1.57e-2      0.886 -0.0379       0.615      0.784     0.540       0.210      0.894     0.919       0.738     0.845       0.923      0.399      0.612      0.854      0.837
## # ℹ 232 more variables: cg19503462 <dbl>, cg26853071 <dbl>, age.now <dbl>, cg11540596 <dbl>, cg00415024 <dbl>, cg10701746 <dbl>, cg00004073 <dbl>, cg11227702 <dbl>, cg19471911 <dbl>,
## #   cg09727210 <dbl>, cg00154902 <dbl>, cg17002719 <dbl>, cg07504457 <dbl>, cg25879395 <dbl>, cg01008088 <dbl>, cg02225060 <dbl>, cg12543766 <dbl>, cg09120722 <dbl>, cg11787167 <dbl>,
## #   cg19799454 <dbl>, cg02887598 <dbl>, cg01128042 <dbl>, cg21697769 <dbl>, cg25208881 <dbl>, cg16779438 <dbl>, cg17386240 <dbl>, cg03088219 <dbl>, cg24883219 <dbl>, cg15535896 <dbl>,
## #   cg16338321 <dbl>, cg21757617 <dbl>, cg18285382 <dbl>, cg17429539 <dbl>, cg10738648 <dbl>, cg02078724 <dbl>, cg09015880 <dbl>, cg20823859 <dbl>, cg18816397 <dbl>, cg16431720 <dbl>,
## #   cg06833284 <dbl>, cg23517115 <dbl>, cg11438323 <dbl>, cg02932958 <dbl>, cg08096656 <dbl>, cg05064044 <dbl>, cg05234269 <dbl>, cg25169289 <dbl>, cg14710850 <dbl>, cg26679884 <dbl>,
## #   cg03600007 <dbl>, cg15098922 <dbl>, cg01921484 <dbl>, cg16715186 <dbl>, cg06961873 <dbl>, cg12240569 <dbl>, cg01910713 <dbl>, cg25712921 <dbl>, cg00648024 <dbl>, cg03982462 <dbl>,
## #   cg08745107 <dbl>, cg26983017 <dbl>, cg00084271 <dbl>, cg16858433 <dbl>, cg06371647 <dbl>, cg26846609 <dbl>, cg15184869 <dbl>, cg13573375 <dbl>, cg04831745 <dbl>, cg22931151 <dbl>, …
print(selected_impAvg_ordered_NAME)
##   [1] "cg23432430" "PC3"        "PC2"        "cg00962106" "PC1"        "cg07158503" "cg06697310" "cg11331837" "cg07634717" "cg03660162" "cg24851651" "cg11019791" "cg20685672" "cg26081710" "cg14168080"
##  [16] "cg03749159" "cg20678988" "cg04156077" "cg19503462" "cg26853071" "age.now"    "cg11540596" "cg00415024" "cg10701746" "cg00004073" "cg11227702" "cg19471911" "cg09727210" "cg00154902" "cg17002719"
##  [31] "cg07504457" "cg25879395" "cg01008088" "cg02225060" "cg12543766" "cg09120722" "cg11787167" "cg19799454" "cg02887598" "cg01128042" "cg21697769" "cg25208881" "cg16779438" "cg17386240" "cg03088219"
##  [46] "cg24883219" "cg15535896" "cg16338321" "cg21757617" "cg18285382" "cg17429539" "cg10738648" "cg02078724" "cg09015880" "cg20823859" "cg18816397" "cg16431720" "cg06833284" "cg23517115" "cg11438323"
##  [61] "cg02932958" "cg08096656" "cg05064044" "cg05234269" "cg25169289" "cg14710850" "cg26679884" "cg03600007" "cg15098922" "cg01921484" "cg16715186" "cg06961873" "cg12240569" "cg01910713" "cg25712921"
##  [76] "cg00648024" "cg03982462" "cg08745107" "cg26983017" "cg00084271" "cg16858433" "cg06371647" "cg26846609" "cg15184869" "cg13573375" "cg04831745" "cg22931151" "cg18918831" "cg07640670" "cg15600437"
##  [91] "cg01280698" "cg12689021" "cg27577781" "cg13405878" "cg22666875" "cg16536985" "cg16202259" "cg18857647" "cg22305850" "cg27224751" "cg09247979" "cg12333628" "cg16571124" "cg03979311" "cg12421087"
## [106] "cg15700429" "cg13739190" "cg00819121" "cg25436480" "cg04768387" "cg24634455" "cg11133939" "cg17042243" "cg22542451" "cg01608425" "cg06864789" "cg06880438" "cg13387643" "cg12702014" "cg03737947"
## [121] "cg02823329" "cg00696044" "cg06960717" "cg20673830" "cg25649515" "cg10681981" "cg15633912" "cg02668233" "cg27272246" "cg18150287" "cg18339359" "cg04718469" "cg01933473" "cg02122327" "cg18993517"
## [136] "cg02495179" "cg02356645" "cg09216282" "cg09584650" "cg00512739" "cg23352245" "cg12776173" "cg19301366" "cg25758034" "cg04316537" "cg14687298" "cg13226272" "cg13372276" "cg12556569" "cg06277607"
## [151] "cg17002338" "cg24307368" "cg14627380" "cg10091792" "cg08584917" "cg18819889" "cg24697433" "cg03084184" "cg23159970" "cg22112152" "cg12784167" "cg08198851" "cg17129965" "cg00939409" "cg08788093"
## [166] "cg09451339" "cg20078646" "cg10788927" "cg16089727" "cg00146240" "cg15775217" "cg18526121" "cg01662749" "cg14192979" "cg03672288" "cg25306893" "cg05392160" "cg05321907" "cg25277809" "cg05876883"
## [181] "cg06715136" "cg06483046" "cg14307563" "cg14170504" "cg04497611" "cg24139837" "cg05161773" "cg05593887" "cg11286989" "cg10240127" "cg27160885" "cg01549082" "cg04412904" "cg14532717" "cg06118351"
## [196] "cg22535849" "cg11706829" "cg00322003" "cg08554146" "cg02627240" "cg18029737" "cg17723206" "cg03549208" "cg21986118" "cg05850457" "cg09785377" "cg14293999" "cg07138269" "cg15985500" "cg14780448"
## [211] "cg04124201" "cg17738613" "cg17906851" "cg22169467" "cg22071943" "cg20981163" "cg10039445" "cg02246922" "cg08896901" "cg02631626" "cg11247378" "cg08857872" "cg00295418" "cg14507637" "cg18949721"
## [226] "cg11187460" "cg12146221" "cg08041188" "cg04867412" "cg00345083" "cg11268585" "cg21388339" "cg12228670" "cg23916408" "cg26901661" "cg21243064" "cg06403901" "cg15730644" "cg00322820" "cg04645024"
## [241] "cg24643105" "cg03221390" "cg21139150" "cg17131279" "cg15501526" "cg13653328" "cg24470466" "cg23836570" "cg13038195" "cg04664583"
print(head(df_selected_Mean))
##                     DX cg23432430          PC3        PC2 cg00962106          PC1 cg07158503 cg06697310 cg11331837 cg07634717 cg03660162 cg24851651 cg11019791 cg20685672 cg26081710 cg14168080
## 200223270003_R02C01 CI  0.9482702 -0.014043316 0.01470293  0.9124898 -0.214185447  0.5777146  0.8454609 0.03692842  0.7483382  0.8691767 0.03674702  0.8112324  0.6712101  0.8751040  0.4190123
## 200223270003_R03C01 CN  0.9455418  0.005055871 0.05745834  0.5375751 -0.172761185  0.6203543  0.8653044 0.57150125  0.8254434  0.5160770 0.05358297  0.7831231  0.7932091  0.9198212  0.4420256
## 200223270003_R06C01 CN  0.9418716  0.029143653 0.08372861  0.5040948 -0.003667305  0.6236025  0.2405168 0.03182862  0.8181246  0.9026304 0.05968923  0.4353250  0.6613646  0.8801892  0.4355521
##                     cg03749159 cg20678988 cg04156077 cg19503462 cg26853071 age.now cg11540596 cg00415024 cg10701746 cg00004073 cg11227702 cg19471911 cg09727210 cg00154902 cg17002719 cg07504457
## 200223270003_R02C01  0.9355921  0.8438718  0.7321883  0.7951675  0.4233820    82.4  0.9238951  0.4299553  0.4795503 0.02928535 0.86486075  0.6334393  0.4240111  0.5137741 0.04939181  0.7116230
## 200223270003_R03C01  0.9153921  0.8548886  0.6865805  0.4537684  0.7451354    78.6  0.8926595  0.3999122  0.4868342 0.02787198 0.49184121  0.8437175  0.8812928  0.8540746 0.40466475  0.6854539
## 200223270003_R06C01  0.9255807  0.7786685  0.8501188  0.6997359  0.4228079    80.4  0.8820252  0.7465084  0.4927257 0.64576463 0.02543724  0.6127952  0.8493743  0.8188126 0.51428089  0.7205633
##                     cg25879395 cg01008088 cg02225060 cg12543766 cg09120722 cg11787167 cg19799454 cg02887598 cg01128042 cg21697769 cg25208881 cg16779438 cg17386240  cg03088219 cg24883219 cg15535896
## 200223270003_R02C01 0.88130864  0.8424817  0.6828159 0.51028134  0.5878977 0.03853894  0.9178930 0.04020908  0.9113420  0.8946108  0.1851956  0.8826150  0.7473400 0.844002862  0.6430473  0.3382952
## 200223270003_R03C01 0.02603438  0.2417656  0.8265195 0.88741539  0.8287506 0.04673831  0.9106247 0.67073881  0.5328806  0.2822953  0.9092286  0.5466924  0.7144809 0.007435243  0.6822115  0.9253926
## 200223270003_R06C01 0.91060615  0.2618620  0.5209552 0.02818501  0.8793344 0.32564508  0.9066551 0.73408417  0.5222757  0.8698740  0.9265502  0.8629492  0.8074824 0.120155222  0.5296903  0.3320191
##                     cg16338321 cg21757617 cg18285382 cg17429539 cg10738648 cg02078724 cg09015880 cg20823859 cg18816397 cg16431720 cg06833284 cg23517115 cg11438323 cg02932958 cg08096656 cg05064044
## 200223270003_R02C01  0.5350242 0.03652647  0.3202927  0.7860900 0.44931577  0.3096774  0.5101716  0.9030711  0.5472925  0.7356099  0.9125144  0.2151144  0.4863471  0.7901008  0.9362594  0.5672851
## 200223270003_R03C01  0.8294062 0.44299089  0.2930577  0.7100923 0.49894016  0.2896133  0.8402106  0.6062985  0.4940355  0.8692449  0.9003482  0.9131440  0.8984559  0.4210489  0.9314878  0.5358875
## 200223270003_R06C01  0.4918708 0.44725379  0.8923595  0.7660838 0.05552024  0.2805612  0.8472063  0.8917348  0.5337018  0.8773137  0.6097933  0.8328364  0.8722772  0.3825995  0.4943033  0.5273964
##                     cg05234269 cg25169289 cg14710850 cg26679884 cg03600007 cg15098922 cg01921484 cg16715186 cg06961873 cg12240569 cg01910713 cg25712921 cg00648024 cg03982462 cg08745107 cg26983017
## 200223270003_R02C01 0.93848584  0.1100884  0.8048592  0.6793815  0.5658487  0.9286092  0.9098550  0.2742789  0.5335591 0.82772064  0.8573169  0.2829848 0.51410972  0.8562777 0.02921338 0.89868232
## 200223270003_R03C01 0.57461229  0.7667174  0.8090950  0.1848705  0.6018832  0.9027517  0.9093137  0.7946153  0.5472606 0.02690547  0.8538850  0.6220919 0.40202875  0.6023731 0.78542320 0.03145466
## 200223270003_R06C01 0.02467208  0.2264993  0.8285902  0.1701734  0.8611166  0.8525611  0.9204487  0.8124316  0.9415177 0.46030640  0.8110366  0.6384003 0.05579011  0.8778458 0.02709928 0.84677625
##                     cg00084271 cg16858433 cg06371647 cg26846609 cg15184869 cg13573375 cg04831745 cg22931151 cg18918831 cg07640670 cg15600437 cg01280698 cg12689021 cg27577781 cg13405878 cg22666875
## 200223270003_R02C01  0.8103611  0.9184356  0.8336894 0.48860949  0.8622328  0.8670419 0.61984995  0.9311023  0.4891660 0.58296513  0.4885353  0.8985067  0.7706828  0.8143535  0.4549662  0.8177182
## 200223270003_R03C01  0.7877006  0.9194211  0.8198684 0.04878986  0.8996252  0.1733934 0.71214149  0.9356702  0.5333801 0.55225610  0.4894487  0.8846201  0.7449475  0.8113185  0.7858042  0.8291957
## 200223270003_R06C01  0.7706165  0.9271632  0.8069537 0.48026945  0.8688117  0.8888246 0.06871768  0.9328614  0.6406575 0.04058533  0.8551374  0.8847132  0.7872237  0.8144274  0.7583938  0.3694180
##                     cg16536985 cg16202259 cg18857647 cg22305850 cg27224751 cg09247979 cg12333628 cg16571124 cg03979311 cg12421087 cg15700429 cg13739190 cg00819121 cg25436480 cg04768387 cg24634455
## 200223270003_R02C01  0.5789643  0.9548726  0.8582332 0.03361934 0.44503947  0.5070956  0.9227884  0.9282854 0.86644909  0.5647607  0.7879010  0.8510103  0.9207001  0.8425160  0.3131047  0.7796391
## 200223270003_R03C01  0.5418687  0.3713483  0.8394132 0.57522232 0.03214912  0.5706177  0.9092861  0.9206431 0.06199853  0.5399655  0.9114530  0.8358482  0.9281472  0.4994032  0.9465814  0.5188241
## 200223270003_R06C01  0.8392044  0.4852461  0.2647491 0.58548744 0.83123722  0.5090215  0.5084647  0.9276842 0.72615553  0.5400348  0.8838233  0.8419471  0.9327211  0.3494312  0.9098563  0.5325725
##                     cg11133939 cg17042243 cg22542451 cg01608425 cg06864789 cg06880438 cg13387643 cg12702014 cg03737947 cg02823329 cg00696044 cg06960717 cg20673830 cg25649515 cg10681981 cg15633912
## 200223270003_R02C01  0.1282694  0.2502905  0.5884356  0.9030410 0.05369415  0.8285145  0.4229959  0.7704049 0.91824910  0.9462397 0.55608424  0.7030978  0.2422052  0.9279829  0.7035090  0.1605530
## 200223270003_R03C01  0.5920898  0.2933475  0.8337068  0.9264388 0.46053125  0.7988881  0.4200273  0.7848681 0.92067153  0.6464005 0.07552381  0.7653402  0.6881735  0.9235753  0.7382662  0.9333421
## 200223270003_R06C01  0.5127706  0.2725457  0.8125084  0.8887753 0.87513655  0.7839538  0.4161488  0.8065993 0.03638091  0.9633930 0.79270858  0.7206218  0.2134634  0.5895839  0.6971989  0.8737362
##                     cg02668233 cg27272246 cg18150287 cg18339359 cg04718469 cg01933473 cg02122327 cg18993517 cg02495179 cg02356645 cg09216282 cg09584650 cg00512739 cg23352245 cg12776173 cg19301366
## 200223270003_R02C01  0.4708431  0.8615873  0.7685695  0.8824858  0.8687522  0.2589014 0.38940091  0.2091538  0.6813307  0.5105903  0.9349248 0.08230254  0.9337648  0.9377232  0.1038804  0.8831393
## 200223270003_R03C01  0.8841930  0.8705287  0.7519166  0.9040272  0.7256813  0.6726133 0.37769608  0.2665896  0.7373055  0.5833923  0.9244259 0.09661586  0.8863895  0.9375774  0.8730635  0.8072679
## 200223270003_R06C01  0.4575646  0.8103777  0.2501173  0.8552121  0.8521881  0.2642560 0.04017909  0.2574003  0.5588114  0.5701428  0.9263996 0.52399749  0.9242748  0.5932742  0.7009491  0.8796022
##                     cg25758034 cg04316537 cg14687298 cg13226272 cg13372276 cg12556569 cg06277607 cg17002338 cg24307368 cg14627380 cg10091792 cg08584917 cg18819889 cg24697433 cg03084184 cg23159970
## 200223270003_R02C01  0.6114028  0.8074830 0.04206702 0.02637249 0.04888111 0.06218231 0.10744587  0.9286251 0.64323677  0.9455369  0.8670733  0.5663205  0.9156157  0.9243095  0.8162981 0.61817246
## 200223270003_R03C01  0.6649219  0.8453340 0.14813581 0.54100016 0.62396373 0.03924599 0.09353494  0.2684163 0.34980461  0.9258964  0.5864221  0.9019732  0.9004455  0.6808390  0.7877128 0.57492600
## 200223270003_R06C01  0.2393844  0.4351695 0.24260002 0.44370701 0.59693465 0.48636893 0.09504696  0.2811103 0.02720398  0.5789898  0.6087997  0.9187789  0.9054439  0.6384606  0.4546397 0.03288909
##                     cg22112152 cg12784167 cg08198851 cg17129965 cg00939409 cg08788093 cg09451339 cg20078646 cg10788927 cg16089727 cg00146240 cg15775217 cg18526121 cg01662749 cg14192979 cg03672288
## 200223270003_R02C01  0.8476101 0.81503498  0.6578905  0.8972140  0.2652180 0.03911678  0.2243746 0.06198170  0.8973154 0.86748697  0.6336151  0.5707441  0.4519781  0.3506201 0.06336040  0.9235592
## 200223270003_R03C01  0.8014136 0.02811410  0.6578186  0.8806673  0.8882671 0.60934160  0.2340702 0.89537412  0.2021398 0.54996692  0.8957183  0.9168327  0.4762313  0.2510946 0.06019651  0.6718625
## 200223270003_R06C01  0.7897897 0.03073269  0.1272153  0.8857237  0.8842646 0.88380243  0.8921284 0.08725521  0.2053075 0.05876736  0.1433218  0.6042521  0.4833367  0.8061480 0.52114282  0.9007629
##                     cg25306893 cg05392160 cg05321907 cg25277809 cg05876883 cg06715136 cg06483046 cg14307563 cg14170504 cg04497611 cg24139837 cg05161773 cg05593887 cg11286989 cg10240127 cg27160885
## 200223270003_R02C01  0.6265392  0.9328933  0.2880477  0.1632342  0.9039064  0.3400192 0.04383925  0.1855966 0.54915621  0.9086359 0.07404605  0.4120912  0.5939220  0.7590008  0.9250553  0.2231606
## 200223270003_R03C01  0.8330282  0.2576881  0.1782629  0.4913711  0.9223308  0.9259109 0.50720277  0.8916957 0.02236650  0.8818513 0.04183445  0.4154907  0.5766550  0.8533989  0.9403255  0.8263885
## 200223270003_R06C01  0.6175380  0.8920726  0.8427929  0.5952124  0.4697980  0.9079807 0.89604910  0.8750052 0.02988245  0.5853116 0.05657120  0.8526849  0.9148338  0.7313884  0.9056974  0.2121179
##                     cg01549082 cg04412904 cg14532717 cg06118351 cg22535849 cg11706829 cg00322003 cg08554146 cg02627240 cg18029737 cg17723206 cg03549208 cg21986118 cg05850457 cg09785377 cg14293999
## 200223270003_R02C01  0.2924138 0.05088595  0.5732280  0.3633940  0.8847704  0.8897234  0.1759911  0.8982080 0.66706843  0.9100454 0.92881042  0.9014487  0.6658175  0.8183013  0.9162088  0.2836710
## 200223270003_R03C01  0.7065693 0.07717659  0.1107638  0.4714860  0.8609966  0.5444785  0.5702070  0.8963074 0.57129408  0.9016634 0.48556255  0.8381784  0.6571296  0.8313023  0.9226292  0.9172023
## 200223270003_R06C01  0.2895440 0.08253743  0.6273416  0.8655962  0.8808022  0.5669449  0.3077122  0.8213878 0.05309659  0.7376586 0.01765023  0.9097817  0.7034445  0.8161364  0.6405193  0.9168166
##                     cg07138269 cg15985500 cg14780448 cg04124201 cg17738613 cg17906851 cg22169467 cg22071943 cg20981163 cg10039445 cg02246922 cg08896901 cg02631626 cg11247378 cg08857872 cg00295418
## 200223270003_R02C01  0.5002290  0.8555262  0.9119141  0.8686421  0.6879612  0.9488392  0.3095010  0.8705217  0.8990628  0.8833873  0.7301201  0.3581911  0.6280766  0.1591185  0.3395280 0.44954665
## 200223270003_R03C01  0.9426707  0.8312198  0.6702102  0.3308589  0.6582258  0.9529718  0.2978585  0.2442648  0.9264076  0.8954055  0.9447019  0.2467071  0.1951736  0.7874849  0.8181845 0.48471295
## 200223270003_R06C01  0.5057781  0.8492103  0.6207355  0.3241613  0.1022257  0.6462151  0.8955853  0.2644581  0.4874651  0.8832807  0.7202230  0.9225209  0.2699849  0.4807942  0.2970779 0.02004532
##                     cg14507637 cg18949721 cg11187460 cg12146221 cg08041188 cg04867412 cg00345083 cg11268585 cg21388339 cg12228670 cg23916408 cg26901661 cg21243064 cg06403901 cg15730644 cg00322820
## 200223270003_R02C01  0.9051258  0.2334245 0.03672179  0.2049284  0.7752456 0.04304823 0.47960968  0.2521544  0.2756268  0.8632174  0.1942275  0.8951971  0.5191606 0.92790690  0.4803181  0.4869764
## 200223270003_R03C01  0.9009460  0.2437792 0.92516409  0.1814927  0.3201255 0.87967997 0.50833875  0.8535791  0.2102269  0.8496212  0.9154993  0.8754981  0.9167649 0.04783341  0.4353906  0.4858988
## 200223270003_R06C01  0.9013686  0.2523095 0.03109553  0.8619250  0.7900939 0.44971146 0.03929249  0.9121931  0.7649181  0.8738949  0.8886255  0.9021064  0.4862205 0.05253626  0.8763048  0.4754313
##                     cg04645024 cg24643105 cg03221390 cg21139150 cg17131279 cg15501526 cg13653328 cg24470466 cg23836570 cg13038195 cg04664583
## 200223270003_R02C01  0.7366541  0.5303418  0.5859063 0.01853264  0.1900637  0.6362531  0.9245434  0.7725300 0.58688450 0.45882213  0.5572814
## 200223270003_R03C01  0.8454827  0.5042688  0.9180706 0.43223243  0.7048637  0.6319253  0.5122938  0.9041432 0.54259383 0.02740132  0.5881190
## 200223270003_R06C01  0.0871902  0.9383050  0.6399867 0.43772680  0.1492861  0.7435100  0.9362798  0.1206738 0.03267304 0.46284376  0.9352717
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]

9.1.2. Logistic Regression Model

9.1.2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)  
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 454 251
dim(testData)
## [1] 194 251
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Mean_LRM1<-caret::confusionMatrix(predictions, testData$DX)

print(cm_FeatEval_Mean_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 120  20
##         CN   8  46
##                                           
##                Accuracy : 0.8557          
##                  95% CI : (0.7982, 0.9019)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 5.787e-10       
##                                           
##                   Kappa : 0.6637          
##                                           
##  Mcnemar's Test P-Value : 0.03764         
##                                           
##             Sensitivity : 0.9375          
##             Specificity : 0.6970          
##          Pos Pred Value : 0.8571          
##          Neg Pred Value : 0.8519          
##              Prevalence : 0.6598          
##          Detection Rate : 0.6186          
##    Detection Prevalence : 0.7216          
##       Balanced Accuracy : 0.8172          
##                                           
##        'Positive' Class : CI              
## 
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Mean_LRM1_Accuracy <- cm_FeatEval_Mean_LRM1$overall["Accuracy"]
cm_FeatEval_Mean_LRM1_Kappa <- cm_FeatEval_Mean_LRM1$overall["Kappa"]

print(cm_FeatEval_Mean_LRM1_Accuracy)
##  Accuracy 
## 0.8556701
print(cm_FeatEval_Mean_LRM1_Kappa)
##     Kappa 
## 0.6636949
print(model_LRM1)
## glmnet 
## 
## 454 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0001769938  0.7906471  0.5288227
##   0.10   0.0017699384  0.7862759  0.5169254
##   0.10   0.0176993845  0.7950183  0.5282824
##   0.55   0.0001769938  0.7532357  0.4484439
##   0.55   0.0017699384  0.7487912  0.4385364
##   0.55   0.0176993845  0.7268132  0.3675163
##   1.00   0.0001769938  0.7312576  0.4017938
##   1.00   0.0017699384  0.7334310  0.3991983
##   1.00   0.0176993845  0.6694994  0.2129710
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01769938.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)

FeatEval_Mean_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.997797356828194"
print(FeatEval_Mean_LRM1_trainAccuracy)
## [1] 0.9977974
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.7483299
FeatEval_Mean_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Mean_mean_accuracy_cv_LRM1)
## [1] 0.7483299
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_LRM1_AUC <- auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG ==6){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_LRM1_AUC <- auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_LRM1_AUC <- auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.8995
## [1] "The auc value is:"
## Area under the curve: 0.8995

if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_LRM1_AUC <- mean_auc
}
print(FeatEval_Mean_LRM1_AUC)
## Area under the curve: 0.8995
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC2         100.00
## PC1          98.69
## PC3          85.31
## cg09727210   70.51
## cg23432430   69.28
## cg07158503   58.72
## cg00962106   56.58
## cg06697310   56.38
## cg09015880   51.41
## cg02225060   50.11
## cg10701746   50.08
## cg16338321   48.90
## cg00819121   48.72
## cg14168080   46.30
## cg21757617   45.96
## cg00415024   45.10
## cg01910713   43.96
## cg16858433   43.85
## cg00004073   43.50
## cg05064044   43.33
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
##         Overall
## 1   2.575692864
## 2   2.542053072
## 3   2.197354106
## 4   1.816131146
## 5   1.784377261
## 6   1.512529211
## 7   1.457354838
## 8   1.452102351
## 9   1.324131788
## 10  1.290796056
## 11  1.289888272
## 12  1.259479369
## 13  1.254889417
## 14  1.192482649
## 15  1.183767581
## 16  1.161712332
## 17  1.132341707
## 18  1.129518398
## 19  1.120355548
## 20  1.115958347
## 21  1.089229656
## 22  1.070239210
## 23  1.063103293
## 24  1.056673418
## 25  1.022051419
## 26  1.013753244
## 27  1.013292944
## 28  0.984087268
## 29  0.968113111
## 30  0.965918978
## 31  0.957516516
## 32  0.948100067
## 33  0.925222146
## 34  0.911930614
## 35  0.908244307
## 36  0.900596205
## 37  0.896025272
## 38  0.893247114
## 39  0.892859194
## 40  0.890991158
## 41  0.871024852
## 42  0.869154610
## 43  0.858232864
## 44  0.853227137
## 45  0.844920488
## 46  0.843025564
## 47  0.841183418
## 48  0.838507743
## 49  0.838473842
## 50  0.837376903
## 51  0.832891228
## 52  0.831126903
## 53  0.829320231
## 54  0.823112068
## 55  0.813723810
## 56  0.810517499
## 57  0.808460674
## 58  0.793685869
## 59  0.783330192
## 60  0.777346379
## 61  0.775260563
## 62  0.767912888
## 63  0.762913278
## 64  0.754787040
## 65  0.750048569
## 66  0.742578777
## 67  0.741918621
## 68  0.733326484
## 69  0.731576115
## 70  0.731287648
## 71  0.728866134
## 72  0.728545425
## 73  0.713293481
## 74  0.710479412
## 75  0.704320784
## 76  0.694469169
## 77  0.693026530
## 78  0.689134681
## 79  0.687787010
## 80  0.679343444
## 81  0.678871612
## 82  0.678780995
## 83  0.677962448
## 84  0.662304720
## 85  0.660778233
## 86  0.656952198
## 87  0.635141474
## 88  0.633182780
## 89  0.630696736
## 90  0.629830475
## 91  0.624731818
## 92  0.613544147
## 93  0.611224214
## 94  0.609806445
## 95  0.609002005
## 96  0.608444817
## 97  0.605916469
## 98  0.602029916
## 99  0.599394485
## 100 0.599387927
## 101 0.598490991
## 102 0.594828101
## 103 0.588317588
## 104 0.586343910
## 105 0.582430093
## 106 0.578700589
## 107 0.577208493
## 108 0.577002200
## 109 0.575875530
## 110 0.574108522
## 111 0.566893230
## 112 0.563722174
## 113 0.559403692
## 114 0.549523775
## 115 0.536809850
## 116 0.529216227
## 117 0.526178401
## 118 0.525132755
## 119 0.519479990
## 120 0.514053496
## 121 0.509019365
## 122 0.505572975
## 123 0.497233088
## 124 0.496719757
## 125 0.492379723
## 126 0.490465569
## 127 0.485555416
## 128 0.485313840
## 129 0.479896214
## 130 0.472963971
## 131 0.472361978
## 132 0.464395942
## 133 0.463328462
## 134 0.460102158
## 135 0.443913239
## 136 0.441228788
## 137 0.439325516
## 138 0.430118866
## 139 0.425679134
## 140 0.419083943
## 141 0.413494863
## 142 0.409679903
## 143 0.408151726
## 144 0.406264206
## 145 0.404807006
## 146 0.399229737
## 147 0.394315450
## 148 0.391812918
## 149 0.389307051
## 150 0.387654146
## 151 0.384334949
## 152 0.382800467
## 153 0.382209818
## 154 0.380188174
## 155 0.372893997
## 156 0.372804811
## 157 0.366827459
## 158 0.362219231
## 159 0.358902853
## 160 0.351215340
## 161 0.347423875
## 162 0.346145125
## 163 0.346027325
## 164 0.341567796
## 165 0.339393863
## 166 0.338184377
## 167 0.330986362
## 168 0.329965958
## 169 0.328259486
## 170 0.314654888
## 171 0.311613050
## 172 0.310423928
## 173 0.307953011
## 174 0.307613316
## 175 0.306407759
## 176 0.302944819
## 177 0.295214081
## 178 0.294314377
## 179 0.292346726
## 180 0.289547825
## 181 0.289167889
## 182 0.284434318
## 183 0.284329874
## 184 0.271911647
## 185 0.267502570
## 186 0.265934703
## 187 0.259937561
## 188 0.258762515
## 189 0.253118489
## 190 0.251899874
## 191 0.251489633
## 192 0.250267143
## 193 0.246932984
## 194 0.240702418
## 195 0.237425205
## 196 0.232852446
## 197 0.231602485
## 198 0.226941316
## 199 0.226892151
## 200 0.226400272
## 201 0.206860745
## 202 0.204994848
## 203 0.204284801
## 204 0.188957830
## 205 0.187016874
## 206 0.186462873
## 207 0.185552444
## 208 0.179906013
## 209 0.175633504
## 210 0.173452301
## 211 0.165920730
## 212 0.162270794
## 213 0.147281845
## 214 0.142958178
## 215 0.138111136
## 216 0.132030882
## 217 0.128730689
## 218 0.118468385
## 219 0.118045139
## 220 0.116275901
## 221 0.113316751
## 222 0.110236805
## 223 0.101716698
## 224 0.097261815
## 225 0.092767633
## 226 0.085636463
## 227 0.082232843
## 228 0.079370146
## 229 0.077248230
## 230 0.076395110
## 231 0.041891109
## 232 0.028100076
## 233 0.023340593
## 234 0.016788164
## 235 0.015744479
## 236 0.015259323
## 237 0.008333347
## 238 0.006346333
## 239 0.002392991
## 240 0.000000000
## 241 0.000000000
## 242 0.000000000
## 243 0.000000000
## 244 0.000000000
## 245 0.000000000
## 246 0.000000000
## 247 0.000000000
## 248 0.000000000
## 249 0.000000000
## 250 0.000000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}

9.1.2.2 Model Diagnose & Improve

9.1.2.2.1 Class imbalance
Class imbalance Check
  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##  CI  CN 
## 427 221
prop.table(table(df_LRM1$DX))
## 
##        CI        CN 
## 0.6589506 0.3410494
table(trainData$DX)
## 
##  CI  CN 
## 299 155
prop.table(table(trainData$DX))
## 
##        CI        CN 
## 0.6585903 0.3414097
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 1.932127
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 1.929032
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 65.488, df = 1, p-value = 5.848e-16
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 45.674, df = 1, p-value = 1.397e-11
Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)
library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)

# Extract the new balanced dataset
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##  CI  CN 
## 299 310
dim(balanced_data_LGR_1)
## [1] 609 251
Fit Model with Balanced Data
ctrl <- trainControl(method = "cv", number = 5)

model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 117  17
##         CN  11  49
##                                           
##                Accuracy : 0.8557          
##                  95% CI : (0.7982, 0.9019)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 5.787e-10       
##                                           
##                   Kappa : 0.6713          
##                                           
##  Mcnemar's Test P-Value : 0.3447          
##                                           
##             Sensitivity : 0.9141          
##             Specificity : 0.7424          
##          Pos Pred Value : 0.8731          
##          Neg Pred Value : 0.8167          
##              Prevalence : 0.6598          
##          Detection Rate : 0.6031          
##    Detection Prevalence : 0.6907          
##       Balanced Accuracy : 0.8282          
##                                           
##        'Positive' Class : CI              
## 
print(model_LRM2)
## glmnet 
## 
## 609 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 487, 488, 487, 487, 487 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0001970375  0.8932259  0.7860533
##   0.10   0.0019703751  0.8997832  0.7992033
##   0.10   0.0197037514  0.8965045  0.7925593
##   0.55   0.0001970375  0.8800975  0.7596369
##   0.55   0.0019703751  0.8784447  0.7563681
##   0.55   0.0197037514  0.8325295  0.6641846
##   1.00   0.0001970375  0.8751795  0.7498703
##   1.00   0.0019703751  0.8620512  0.7233841
##   1.00   0.0197037514  0.7783634  0.5555074
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.001970375.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8662422
importance_model_LRM2 <- varImp(model_LRM2)

print(importance_model_LRM2)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC3         100.00
## PC1          74.85
## PC2          64.52
## cg09727210   47.86
## cg23432430   42.70
## cg06697310   40.13
## cg07158503   38.56
## cg09015880   38.49
## cg01910713   37.02
## cg10701746   37.02
## cg16858433   36.92
## cg00962106   36.72
## cg02225060   35.61
## cg16338321   34.02
## cg00819121   33.92
## cg05064044   31.40
## cg14168080   31.32
## cg26081710   30.24
## cg21757617   30.22
## cg00415024   30.21
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)

library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM2)  
  
}
##         Overall
## 1   7.626168392
## 2   5.708243293
## 3   4.920738272
## 4   3.650035214
## 5   3.256426793
## 6   3.060475169
## 7   2.940580525
## 8   2.935360538
## 9   2.822930345
## 10  2.822921883
## 11  2.815297328
## 12  2.800146058
## 13  2.715864303
## 14  2.594264272
## 15  2.586607983
## 16  2.394954562
## 17  2.388253500
## 18  2.305929947
## 19  2.304625111
## 20  2.303512068
## 21  2.303411787
## 22  2.239381432
## 23  2.191027253
## 24  2.144171528
## 25  2.139462016
## 26  2.138163657
## 27  2.121288748
## 28  2.101727589
## 29  2.081984339
## 30  2.058053254
## 31  2.015403249
## 32  2.008987791
## 33  1.999853524
## 34  1.981211520
## 35  1.975655208
## 36  1.969560370
## 37  1.960070458
## 38  1.940851663
## 39  1.934505308
## 40  1.924351007
## 41  1.909562874
## 42  1.900324095
## 43  1.895514971
## 44  1.894786852
## 45  1.892590572
## 46  1.890537794
## 47  1.883357116
## 48  1.834133566
## 49  1.815362262
## 50  1.790727583
## 51  1.730703694
## 52  1.718165534
## 53  1.712189580
## 54  1.707771570
## 55  1.701723384
## 56  1.697446693
## 57  1.695494844
## 58  1.689082597
## 59  1.669448717
## 60  1.658338617
## 61  1.600558722
## 62  1.598974009
## 63  1.578749800
## 64  1.564887051
## 65  1.560759044
## 66  1.523519081
## 67  1.522802976
## 68  1.509677802
## 69  1.492594026
## 70  1.484026881
## 71  1.462118102
## 72  1.453843287
## 73  1.438553455
## 74  1.428372883
## 75  1.427099604
## 76  1.422520144
## 77  1.401979185
## 78  1.400992771
## 79  1.397894558
## 80  1.396066803
## 81  1.392124347
## 82  1.369271778
## 83  1.368605768
## 84  1.347182369
## 85  1.344950811
## 86  1.321308992
## 87  1.319542273
## 88  1.302973295
## 89  1.293898760
## 90  1.290742696
## 91  1.288446303
## 92  1.283642424
## 93  1.280867701
## 94  1.278363750
## 95  1.273384614
## 96  1.273100584
## 97  1.266130570
## 98  1.260698169
## 99  1.252630135
## 100 1.247124736
## 101 1.235823146
## 102 1.229137502
## 103 1.221238498
## 104 1.216918086
## 105 1.213379501
## 106 1.204925507
## 107 1.203396064
## 108 1.181953915
## 109 1.179948881
## 110 1.143071334
## 111 1.139797320
## 112 1.137407724
## 113 1.116913983
## 114 1.115263862
## 115 1.107942689
## 116 1.098907153
## 117 1.094473592
## 118 1.089794430
## 119 1.086590591
## 120 1.075530863
## 121 1.075001014
## 122 1.066347867
## 123 1.064143289
## 124 1.061362043
## 125 1.061218924
## 126 1.038590094
## 127 1.014758390
## 128 1.010861738
## 129 0.989946687
## 130 0.983789842
## 131 0.964065198
## 132 0.961677347
## 133 0.957182679
## 134 0.949257351
## 135 0.947801613
## 136 0.926208768
## 137 0.925742203
## 138 0.904714480
## 139 0.892729772
## 140 0.884805967
## 141 0.881190992
## 142 0.880665353
## 143 0.861837431
## 144 0.860971878
## 145 0.847169072
## 146 0.846823144
## 147 0.843335626
## 148 0.828071741
## 149 0.826669194
## 150 0.818128986
## 151 0.814548234
## 152 0.807648082
## 153 0.805453479
## 154 0.798459248
## 155 0.796436683
## 156 0.794302309
## 157 0.792795018
## 158 0.774522419
## 159 0.770381822
## 160 0.745178832
## 161 0.737581152
## 162 0.732578502
## 163 0.718918025
## 164 0.714122801
## 165 0.705466107
## 166 0.703991200
## 167 0.694970749
## 168 0.669560375
## 169 0.666705203
## 170 0.666357646
## 171 0.657348103
## 172 0.656930021
## 173 0.649018083
## 174 0.646581302
## 175 0.629985255
## 176 0.605894778
## 177 0.601257670
## 178 0.594414162
## 179 0.589525069
## 180 0.575042510
## 181 0.560691013
## 182 0.541858924
## 183 0.541774587
## 184 0.512922355
## 185 0.508211182
## 186 0.506800713
## 187 0.489235401
## 188 0.482734308
## 189 0.479189245
## 190 0.469714743
## 191 0.468005366
## 192 0.458488097
## 193 0.455499390
## 194 0.451919305
## 195 0.443627480
## 196 0.438673547
## 197 0.433559054
## 198 0.388621987
## 199 0.382474434
## 200 0.379896870
## 201 0.379150717
## 202 0.372914237
## 203 0.357385215
## 204 0.356748522
## 205 0.340160222
## 206 0.331313679
## 207 0.307845930
## 208 0.292537489
## 209 0.292336825
## 210 0.286160795
## 211 0.272081226
## 212 0.245229821
## 213 0.237000018
## 214 0.234363804
## 215 0.229641885
## 216 0.229124507
## 217 0.219722634
## 218 0.153025151
## 219 0.152228727
## 220 0.143965376
## 221 0.142354608
## 222 0.127745873
## 223 0.125903825
## 224 0.125840855
## 225 0.124330101
## 226 0.122835606
## 227 0.115674236
## 228 0.112953039
## 229 0.106701584
## 230 0.090331434
## 231 0.085469427
## 232 0.083409602
## 233 0.067404511
## 234 0.062023655
## 235 0.051029343
## 236 0.035999200
## 237 0.032652569
## 238 0.021676545
## 239 0.011194577
## 240 0.003032207
## 241 0.000000000
## 242 0.000000000
## 243 0.000000000
## 244 0.000000000
## 245 0.000000000
## 246 0.000000000
## 247 0.000000000
## 248 0.000000000
## 249 0.000000000
## 250 0.000000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))
  print(importance_model_LRM2_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.8929
## [1] "The auc value is:"
## Area under the curve: 0.8929

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}

9.1.3. Elastic Net

9.1.3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet 
## 
## 454 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa     
##   0      0.00100000  0.8105250  0.57122481
##   0      0.05357895  0.8391697  0.62561988
##   0      0.10615789  0.8325275  0.60761462
##   0      0.15873684  0.8237118  0.58287645
##   0      0.21131579  0.8193162  0.56747468
##   0      0.26389474  0.8237363  0.57533129
##   0      0.31647368  0.8215140  0.56784456
##   0      0.36905263  0.8061050  0.52474966
##   0      0.42163158  0.8061050  0.51999323
##   0      0.47421053  0.8083272  0.52161492
##   0      0.52678947  0.8017094  0.50214775
##   0      0.57936842  0.7950672  0.47863449
##   0      0.63194737  0.7862271  0.45054427
##   0      0.68452632  0.7686203  0.39777307
##   0      0.73710526  0.7642002  0.38396262
##   0      0.78968421  0.7598046  0.36818319
##   0      0.84226316  0.7553846  0.35374295
##   0      0.89484211  0.7509890  0.34013156
##   0      0.94742105  0.7487912  0.33332759
##   0      1.00000000  0.7443956  0.31674039
##   1      0.00100000  0.7356532  0.40960120
##   1      0.05357895  0.6608059  0.03185117
##   1      0.10615789  0.6585836  0.00000000
##   1      0.15873684  0.6585836  0.00000000
##   1      0.21131579  0.6585836  0.00000000
##   1      0.26389474  0.6585836  0.00000000
##   1      0.31647368  0.6585836  0.00000000
##   1      0.36905263  0.6585836  0.00000000
##   1      0.42163158  0.6585836  0.00000000
##   1      0.47421053  0.6585836  0.00000000
##   1      0.52678947  0.6585836  0.00000000
##   1      0.57936842  0.6585836  0.00000000
##   1      0.63194737  0.6585836  0.00000000
##   1      0.68452632  0.6585836  0.00000000
##   1      0.73710526  0.6585836  0.00000000
##   1      0.78968421  0.6585836  0.00000000
##   1      0.84226316  0.6585836  0.00000000
##   1      0.89484211  0.6585836  0.00000000
##   1      0.94742105  0.6585836  0.00000000
##   1      1.00000000  0.6585836  0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.05357895.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
FeatEval_Mean_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Mean_mean_accuracy_cv_ENM1)
## [1] 0.7279298
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)

FeatEval_Mean_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.997797356828194"
print(FeatEval_Mean_ENM1_trainAccuracy)
## [1] 0.9977974
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Mean_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Mean_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 125  16
##         CN   3  50
##                                         
##                Accuracy : 0.9021        
##                  95% CI : (0.8513, 0.94)
##     No Information Rate : 0.6598        
##     P-Value [Acc > NIR] : 3.867e-15     
##                                         
##                   Kappa : 0.7709        
##                                         
##  Mcnemar's Test P-Value : 0.005905      
##                                         
##             Sensitivity : 0.9766        
##             Specificity : 0.7576        
##          Pos Pred Value : 0.8865        
##          Neg Pred Value : 0.9434        
##              Prevalence : 0.6598        
##          Detection Rate : 0.6443        
##    Detection Prevalence : 0.7268        
##       Balanced Accuracy : 0.8671        
##                                         
##        'Positive' Class : CI            
## 
cm_FeatEval_Mean_ENM1_Accuracy<-cm_FeatEval_Mean_ENM1$overall["Accuracy"]
cm_FeatEval_Mean_ENM1_Kappa<-cm_FeatEval_Mean_ENM1$overall["Kappa"]
print(cm_FeatEval_Mean_ENM1_Accuracy)
##  Accuracy 
## 0.9020619
print(cm_FeatEval_Mean_ENM1_Kappa)
##     Kappa 
## 0.7709136
importance_elastic_net_model1<- varImp(elastic_net_model1)


print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC2         100.00
## PC1          79.34
## PC3          78.32
## cg23432430   55.02
## cg09727210   53.22
## cg07158503   49.85
## cg00962106   46.66
## cg06697310   45.03
## cg02225060   42.44
## cg09015880   40.33
## cg16338321   40.04
## cg00819121   38.08
## cg10701746   37.31
## cg01910713   37.05
## cg16858433   37.01
## cg00415024   36.88
## cg05064044   36.51
## cg21757617   36.22
## cg00004073   35.95
## cg02887598   35.05
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))

print(Ordered_importance_elastic_net_final_model1) 
  
}
##         Overall
## 1   2.148573032
## 2   1.706255190
## 3   1.684378455
## 4   1.185546539
## 5   1.147028334
## 6   1.074895197
## 7   1.006472867
## 8   0.971538752
## 9   0.916157299
## 10  0.870961671
## 11  0.864798330
## 12  0.822762200
## 13  0.806383460
## 14  0.800730021
## 15  0.799920869
## 16  0.797131750
## 17  0.789200897
## 18  0.783039245
## 19  0.777224056
## 20  0.757963743
## 21  0.757654057
## 22  0.741221362
## 23  0.722650850
## 24  0.719600416
## 25  0.719593990
## 26  0.716897226
## 27  0.716596771
## 28  0.715139988
## 29  0.704583658
## 30  0.702722099
## 31  0.687201087
## 32  0.683237470
## 33  0.678832870
## 34  0.664738971
## 35  0.663168351
## 36  0.658379068
## 37  0.652977848
## 38  0.648571992
## 39  0.647013552
## 40  0.646885794
## 41  0.638640969
## 42  0.628953551
## 43  0.628569382
## 44  0.625829001
## 45  0.624358410
## 46  0.621757212
## 47  0.619922644
## 48  0.618846349
## 49  0.615590144
## 50  0.614635989
## 51  0.610262017
## 52  0.608370804
## 53  0.607764354
## 54  0.595844806
## 55  0.593576110
## 56  0.591552931
## 57  0.590436860
## 58  0.588727285
## 59  0.577839092
## 60  0.572716361
## 61  0.571805405
## 62  0.567158183
## 63  0.567147404
## 64  0.562036815
## 65  0.559065613
## 66  0.558731184
## 67  0.557321296
## 68  0.555952972
## 69  0.550144027
## 70  0.549539146
## 71  0.544350097
## 72  0.539355924
## 73  0.529763256
## 74  0.526875865
## 75  0.519610796
## 76  0.519136962
## 77  0.518555658
## 78  0.515654244
## 79  0.513401738
## 80  0.512762508
## 81  0.509230510
## 82  0.507349179
## 83  0.504249009
## 84  0.499870518
## 85  0.499392870
## 86  0.498436136
## 87  0.496820998
## 88  0.496007653
## 89  0.493605066
## 90  0.491310353
## 91  0.489354225
## 92  0.487713012
## 93  0.486532491
## 94  0.483883132
## 95  0.483705390
## 96  0.478883641
## 97  0.477966203
## 98  0.477815314
## 99  0.475274702
## 100 0.473161227
## 101 0.469604101
## 102 0.468422803
## 103 0.467318468
## 104 0.464734413
## 105 0.460568625
## 106 0.459861082
## 107 0.456486080
## 108 0.454668239
## 109 0.453262830
## 110 0.449705999
## 111 0.449103371
## 112 0.446970728
## 113 0.435592764
## 114 0.435027153
## 115 0.428573128
## 116 0.427583540
## 117 0.423733942
## 118 0.421902277
## 119 0.421814730
## 120 0.420474708
## 121 0.418197348
## 122 0.411284210
## 123 0.401073299
## 124 0.398743850
## 125 0.397411988
## 126 0.395033333
## 127 0.392683225
## 128 0.391178767
## 129 0.387534602
## 130 0.386542311
## 131 0.384179028
## 132 0.384102134
## 133 0.382030140
## 134 0.381902855
## 135 0.378169690
## 136 0.377263302
## 137 0.377132649
## 138 0.371634005
## 139 0.368398186
## 140 0.366497268
## 141 0.363504668
## 142 0.361838849
## 143 0.360496679
## 144 0.357656914
## 145 0.352966228
## 146 0.352804285
## 147 0.351681041
## 148 0.349543031
## 149 0.349024731
## 150 0.345054012
## 151 0.342722309
## 152 0.341165939
## 153 0.340817943
## 154 0.334699254
## 155 0.333271484
## 156 0.332975537
## 157 0.332489769
## 158 0.328688750
## 159 0.321703634
## 160 0.321661528
## 161 0.316442474
## 162 0.311634549
## 163 0.310271334
## 164 0.307872964
## 165 0.305679529
## 166 0.302278564
## 167 0.301033726
## 168 0.300355030
## 169 0.299194456
## 170 0.295412901
## 171 0.294898782
## 172 0.292337866
## 173 0.289851780
## 174 0.289459154
## 175 0.286877278
## 176 0.283357692
## 177 0.282040029
## 178 0.280061304
## 179 0.278014895
## 180 0.277986002
## 181 0.276571766
## 182 0.276176234
## 183 0.275849320
## 184 0.271796964
## 185 0.268943012
## 186 0.268911908
## 187 0.264284175
## 188 0.258934498
## 189 0.256978382
## 190 0.253785184
## 191 0.249919869
## 192 0.249144086
## 193 0.248675163
## 194 0.246344885
## 195 0.244535615
## 196 0.243861435
## 197 0.242943459
## 198 0.239778197
## 199 0.237421815
## 200 0.237150925
## 201 0.237074649
## 202 0.236593382
## 203 0.235795904
## 204 0.234473298
## 205 0.232115012
## 206 0.231606206
## 207 0.226628054
## 208 0.226010276
## 209 0.224179919
## 210 0.220374418
## 211 0.219554872
## 212 0.219066550
## 213 0.213380728
## 214 0.211606806
## 215 0.204601025
## 216 0.201241155
## 217 0.199903557
## 218 0.195404833
## 219 0.192302883
## 220 0.185031975
## 221 0.184858527
## 222 0.184705139
## 223 0.182305629
## 224 0.173781462
## 225 0.166916157
## 226 0.162004347
## 227 0.156089912
## 228 0.147563454
## 229 0.143781473
## 230 0.141201310
## 231 0.137864868
## 232 0.137587535
## 233 0.133442479
## 234 0.129529901
## 235 0.124427636
## 236 0.120138693
## 237 0.105466343
## 238 0.096470225
## 239 0.096441174
## 240 0.093914108
## 241 0.092653859
## 242 0.084564737
## 243 0.084119258
## 244 0.071443693
## 245 0.048668501
## 246 0.029471168
## 247 0.013715935
## 248 0.011671792
## 249 0.009043756
## 250 0.007478005
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_elastic_net_model1_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_ENM1_AUC <- auc_value
  print(auc_value) 
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_ENM1_AUC <- auc_value
  print(auc_value) 
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_ENM1_AUC <- auc_value
  print(auc_value) 
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## Area under the curve: 0.9157

if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_ENM1_AUC <- mean_auc
}

9.1.4. XGBoost

9.1.4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 454 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa      
##   0.3  1          0.6               0.50        50      0.6365812   0.10788567
##   0.3  1          0.6               0.50       100      0.6674481   0.21995123
##   0.3  1          0.6               0.50       150      0.6849573   0.25290695
##   0.3  1          0.6               0.75        50      0.6189255   0.04382282
##   0.3  1          0.6               0.75       100      0.6476190   0.13266136
##   0.3  1          0.6               0.75       150      0.6586325   0.17109323
##   0.3  1          0.6               1.00        50      0.6101099  -0.02231134
##   0.3  1          0.6               1.00       100      0.6344078   0.06988703
##   0.3  1          0.6               1.00       150      0.6342857   0.07373553
##   0.3  1          0.8               0.50        50      0.6189499   0.06631012
##   0.3  1          0.8               0.50       100      0.6410256   0.15370968
##   0.3  1          0.8               0.50       150      0.6828571   0.25808227
##   0.3  1          0.8               0.75        50      0.6057631   0.01027818
##   0.3  1          0.8               0.75       100      0.6409768   0.12634940
##   0.3  1          0.8               0.75       150      0.6542125   0.16304486
##   0.3  1          0.8               1.00        50      0.5836874  -0.08348795
##   0.3  1          0.8               1.00       100      0.6321612   0.05948269
##   0.3  1          0.8               1.00       150      0.6365568   0.08324654
##   0.3  2          0.6               0.50        50      0.6522100   0.15464981
##   0.3  2          0.6               0.50       100      0.6874237   0.22369222
##   0.3  2          0.6               0.50       150      0.6676190   0.17752609
##   0.3  2          0.6               0.75        50      0.6320879   0.08757399
##   0.3  2          0.6               0.75       100      0.6607570   0.15348443
##   0.3  2          0.6               0.75       150      0.6673993   0.15993719
##   0.3  2          0.6               1.00        50      0.6189988   0.04315657
##   0.3  2          0.6               1.00       100      0.6255922   0.06107384
##   0.3  2          0.6               1.00       150      0.6344567   0.07882187
##   0.3  2          0.8               0.50        50      0.6299145   0.09854096
##   0.3  2          0.8               0.50       100      0.6762882   0.21190122
##   0.3  2          0.8               0.50       150      0.6784371   0.22950354
##   0.3  2          0.8               0.75        50      0.6586569   0.13460074
##   0.3  2          0.8               0.75       100      0.6585348   0.14386674
##   0.3  2          0.8               0.75       150      0.6828083   0.20497846
##   0.3  2          0.8               1.00        50      0.6366300   0.05173251
##   0.3  2          0.8               1.00       100      0.6543101   0.10006168
##   0.3  2          0.8               1.00       150      0.6520391   0.09450595
##   0.3  3          0.6               0.50        50      0.6740659   0.20259786
##   0.3  3          0.6               0.50       100      0.6807326   0.20698701
##   0.3  3          0.6               0.50       150      0.6872772   0.21617104
##   0.3  3          0.6               0.75        50      0.6651282   0.14713367
##   0.3  3          0.6               0.75       100      0.6607326   0.13947132
##   0.3  3          0.6               0.75       150      0.6607326   0.14505460
##   0.3  3          0.6               1.00        50      0.6784371   0.16219525
##   0.3  3          0.6               1.00       100      0.6717705   0.14372236
##   0.3  3          0.6               1.00       150      0.6717949   0.14313190
##   0.3  3          0.8               0.50        50      0.6959951   0.23440527
##   0.3  3          0.8               0.50       100      0.7048840   0.26749494
##   0.3  3          0.8               0.50       150      0.6937973   0.24051736
##   0.3  3          0.8               0.75        50      0.6762149   0.18476119
##   0.3  3          0.8               0.75       100      0.6740415   0.17795943
##   0.3  3          0.8               0.75       150      0.6807082   0.19516713
##   0.3  3          0.8               1.00        50      0.6606838   0.10947103
##   0.3  3          0.8               1.00       100      0.6651038   0.13783716
##   0.3  3          0.8               1.00       150      0.6651526   0.13675649
##   0.4  1          0.6               0.50        50      0.5902564   0.01434726
##   0.4  1          0.6               0.50       100      0.6343101   0.13629136
##   0.4  1          0.6               0.50       150      0.6454457   0.17359979
##   0.4  1          0.6               0.75        50      0.6012210   0.02018018
##   0.4  1          0.6               0.75       100      0.6453480   0.16060090
##   0.4  1          0.6               0.75       150      0.6562882   0.18007152
##   0.4  1          0.6               1.00        50      0.5945543  -0.01020432
##   0.4  1          0.6               1.00       100      0.6144811   0.05431029
##   0.4  1          0.6               1.00       150      0.6276923   0.09430014
##   0.4  1          0.8               0.50        50      0.6167033   0.07147824
##   0.4  1          0.8               0.50       100      0.6277656   0.10948056
##   0.4  1          0.8               0.50       150      0.6651282   0.20264658
##   0.4  1          0.8               0.75        50      0.6232967   0.08840062
##   0.4  1          0.8               0.75       100      0.6299634   0.10661153
##   0.4  1          0.8               0.75       150      0.6388767   0.13715347
##   0.4  1          0.8               1.00        50      0.6101343   0.01054855
##   0.4  1          0.8               1.00       100      0.6233455   0.07458927
##   0.4  1          0.8               1.00       150      0.6255189   0.07350157
##   0.4  2          0.6               0.50        50      0.6387057   0.13305782
##   0.4  2          0.6               0.50       100      0.6587302   0.18893277
##   0.4  2          0.6               0.50       150      0.6631258   0.19956340
##   0.4  2          0.6               0.75        50      0.6342857   0.11592014
##   0.4  2          0.6               0.75       100      0.6585348   0.16608439
##   0.4  2          0.6               0.75       150      0.6608059   0.16909425
##   0.4  2          0.6               1.00        50      0.6388278   0.09655057
##   0.4  2          0.6               1.00       100      0.6695726   0.16570314
##   0.4  2          0.6               1.00       150      0.6717460   0.16319353
##   0.4  2          0.8               0.50        50      0.6454457   0.14740821
##   0.4  2          0.8               0.50       100      0.6763370   0.21877314
##   0.4  2          0.8               0.50       150      0.6851526   0.23453710
##   0.4  2          0.8               0.75        50      0.6520147   0.14830491
##   0.4  2          0.8               0.75       100      0.6629548   0.18380401
##   0.4  2          0.8               0.75       150      0.6783639   0.21450927
##   0.4  2          0.8               1.00        50      0.6256899   0.06924968
##   0.4  2          0.8               1.00       100      0.6476679   0.11665842
##   0.4  2          0.8               1.00       150      0.6564591   0.14400806
##   0.4  3          0.6               0.50        50      0.6651526   0.19458545
##   0.4  3          0.6               0.50       100      0.6717460   0.20483072
##   0.4  3          0.6               0.50       150      0.6937485   0.25406111
##   0.4  3          0.6               0.75        50      0.6674481   0.15965126
##   0.4  3          0.6               0.75       100      0.6763370   0.17411914
##   0.4  3          0.6               0.75       150      0.6807326   0.18571779
##   0.4  3          0.6               1.00        50      0.6564103   0.12427286
##   0.4  3          0.6               1.00       100      0.6431990   0.11344816
##   0.4  3          0.6               1.00       150      0.6476190   0.12390111
##   0.4  3          0.8               0.50        50      0.6651526   0.19987697
##   0.4  3          0.8               0.50       100      0.6827839   0.24242412
##   0.4  3          0.8               0.50       150      0.6761661   0.23503793
##   0.4  3          0.8               0.75        50      0.6564591   0.12325398
##   0.4  3          0.8               0.75       100      0.6718926   0.16716183
##   0.4  3          0.8               0.75       150      0.6850061   0.21132636
##   0.4  3          0.8               1.00        50      0.6475702   0.10075691
##   0.4  3          0.8               1.00       100      0.6564103   0.11679435
##   0.4  3          0.8               1.00       150      0.6497680   0.09859060
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 100, max_depth = 3, eta = 0.3, gamma = 0, colsample_bytree = 0.8, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.652953
FeatEval_Mean_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Mean_mean_accuracy_cv_xgb)
## [1] 0.652953
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Mean_xgb_trainAccuracy <- train_accuracy

print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Mean_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Mean_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Mean_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 112  44
##         CN  16  22
##                                          
##                Accuracy : 0.6907         
##                  95% CI : (0.6205, 0.755)
##     No Information Rate : 0.6598         
##     P-Value [Acc > NIR] : 0.2030127      
##                                          
##                   Kappa : 0.2322         
##                                          
##  Mcnemar's Test P-Value : 0.0004909      
##                                          
##             Sensitivity : 0.8750         
##             Specificity : 0.3333         
##          Pos Pred Value : 0.7179         
##          Neg Pred Value : 0.5789         
##              Prevalence : 0.6598         
##          Detection Rate : 0.5773         
##    Detection Prevalence : 0.8041         
##       Balanced Accuracy : 0.6042         
##                                          
##        'Positive' Class : CI             
## 
cm_FeatEval_Mean_xgb_Accuracy <-cm_FeatEval_Mean_xgb$overall["Accuracy"]
cm_FeatEval_Mean_xgb_Kappa <-cm_FeatEval_Mean_xgb$overall["Kappa"]

print(cm_FeatEval_Mean_xgb_Accuracy)
##  Accuracy 
## 0.6907216
print(cm_FeatEval_Mean_xgb_Kappa)
##   Kappa 
## 0.23219
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## cg03749159  100.00
## age.now      76.15
## cg11331837   70.86
## cg22666875   67.12
## cg02631626   64.79
## cg11019791   59.38
## cg16779438   53.84
## cg04124201   53.50
## cg10240127   52.94
## cg08041188   51.40
## cg12689021   49.32
## cg14168080   48.41
## cg06864789   46.19
## cg23432430   43.97
## cg01008088   43.78
## cg25436480   41.91
## cg26846609   41.47
## cg16431720   41.28
## cg08745107   38.83
## cg16338321   38.03
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##         Feature         Gain        Cover   Frequency   Importance
##          <char>        <num>        <num>       <num>        <num>
##   1: cg03749159 2.954501e-02 0.0147704384 0.010548523 2.954501e-02
##   2:    age.now 2.249708e-02 0.0166658714 0.014767932 2.249708e-02
##   3: cg11331837 2.093539e-02 0.0228889338 0.014767932 2.093539e-02
##   4: cg22666875 1.983190e-02 0.0106072872 0.006329114 1.983190e-02
##   5: cg02631626 1.914240e-02 0.0154894087 0.008438819 1.914240e-02
##  ---                                                              
## 211: cg14710850 1.227445e-04 0.0004908317 0.002109705 1.227445e-04
## 212: cg08896901 8.861563e-05 0.0005177839 0.002109705 8.861563e-05
## 213: cg04664583 8.729897e-05 0.0004251376 0.002109705 8.729897e-05
## 214: cg11706829 6.295243e-05 0.0004132903 0.002109705 6.295243e-05
## 215: cg09451339 1.458784e-05 0.0005991367 0.002109705 1.458784e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.7398

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_xgb_AUC <- mean_auc
}
print(FeatEval_Mean_xgb_AUC)
## Area under the curve: 0.7398

9.1.5. Random Forest

9.1.5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)


print(rf_model)
## Random Forest 
## 
## 454 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa      
##     2   0.6608059  0.008374734
##   126   0.6695971  0.060559327
##   250   0.6674725  0.056471002
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 126.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.6659585
FeatEval_Mean_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Mean_mean_accuracy_cv_rf)
## [1] 0.6659585
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")


train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
FeatEval_Mean_rf_trainAccuracy<-train_accuracy
print(FeatEval_Mean_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Mean_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Mean_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 125  61
##         CN   3   5
##                                           
##                Accuracy : 0.6701          
##                  95% CI : (0.5991, 0.7358)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 0.4131          
##                                           
##                   Kappa : 0.0665          
##                                           
##  Mcnemar's Test P-Value : 1.041e-12       
##                                           
##             Sensitivity : 0.97656         
##             Specificity : 0.07576         
##          Pos Pred Value : 0.67204         
##          Neg Pred Value : 0.62500         
##              Prevalence : 0.65979         
##          Detection Rate : 0.64433         
##    Detection Prevalence : 0.95876         
##       Balanced Accuracy : 0.52616         
##                                           
##        'Positive' Class : CI              
## 
cm_FeatEval_Mean_rf_Accuracy<-cm_FeatEval_Mean_rf$overall["Accuracy"]
print(cm_FeatEval_Mean_rf_Accuracy)
##  Accuracy 
## 0.6701031
cm_FeatEval_Mean_rf_Kappa<-cm_FeatEval_Mean_rf$overall["Kappa"]
print(cm_FeatEval_Mean_rf_Kappa)
##      Kappa 
## 0.06646617
importance_rf_model <- varImp(rf_model)


print(importance_rf_model)
## rf variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Importance
## cg03749159     100.00
## cg23432430      89.91
## cg01008088      75.78
## cg13405878      73.11
## cg21697769      69.34
## cg06277607      64.80
## cg03982462      64.33
## cg05234269      62.11
## cg06864789      61.78
## cg06697310      61.78
## cg25712921      60.38
## cg11331837      59.36
## cg00696044      58.88
## cg00415024      58.48
## cg03600007      57.05
## cg14170504      56.16
## cg12689021      56.07
## cg01128042      56.01
## cg02887598      54.71
## cg23836570      54.65
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5 ){
  
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)

Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))


print(Ordered_importance_rf_final_model)
  
}
if( METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
  
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)

Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))


print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==3 ){
  
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)

Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))


print(Ordered_importance_rf_final_model)
  
}
##               CI           CN
## 1    4.691019510  4.691019510
## 2    4.028808935  4.028808935
## 3    3.102179448  3.102179448
## 4    2.926732215  2.926732215
## 5    2.679101610  2.679101610
## 6    2.381321584  2.381321584
## 7    2.350913814  2.350913814
## 8    2.204844334  2.204844334
## 9    2.183321051  2.183321051
## 10   2.183150451  2.183150451
## 11   2.091568506  2.091568506
## 12   2.024798093  2.024798093
## 13   1.992771795  1.992771795
## 14   1.967044028  1.967044028
## 15   1.872747232  1.872747232
## 16   1.814295674  1.814295674
## 17   1.808787131  1.808787131
## 18   1.804489014  1.804489014
## 19   1.719304698  1.719304698
## 20   1.715485181  1.715485181
## 21   1.658810535  1.658810535
## 22   1.658672594  1.658672594
## 23   1.624858034  1.624858034
## 24   1.581486484  1.581486484
## 25   1.574274583  1.574274583
## 26   1.515414262  1.515414262
## 27   1.486828449  1.486828449
## 28   1.457594932  1.457594932
## 29   1.433359528  1.433359528
## 30   1.408043759  1.408043759
## 31   1.407844251  1.407844251
## 32   1.391281529  1.391281529
## 33   1.388770192  1.388770192
## 34   1.374621205  1.374621205
## 35   1.307880163  1.307880163
## 36   1.298346818  1.298346818
## 37   1.273449520  1.273449520
## 38   1.267661705  1.267661705
## 39   1.260631829  1.260631829
## 40   1.240401942  1.240401942
## 41   1.240357967  1.240357967
## 42   1.239896836  1.239896836
## 43   1.221041947  1.221041947
## 44   1.212352067  1.212352067
## 45   1.209867238  1.209867238
## 46   1.206446728  1.206446728
## 47   1.199143304  1.199143304
## 48   1.127786453  1.127786453
## 49   1.122618389  1.122618389
## 50   1.084869564  1.084869564
## 51   1.075211083  1.075211083
## 52   1.067172194  1.067172194
## 53   1.062399822  1.062399822
## 54   1.061635693  1.061635693
## 55   1.046407185  1.046407185
## 56   1.045671192  1.045671192
## 57   1.044606121  1.044606121
## 58   0.997875392  0.997875392
## 59   0.996833866  0.996833866
## 60   0.956989206  0.956989206
## 61   0.952176697  0.952176697
## 62   0.932176930  0.932176930
## 63   0.914642819  0.914642819
## 64   0.910573484  0.910573484
## 65   0.905801428  0.905801428
## 66   0.902807042  0.902807042
## 67   0.885801204  0.885801204
## 68   0.829960278  0.829960278
## 69   0.829469109  0.829469109
## 70   0.825387963  0.825387963
## 71   0.823263231  0.823263231
## 72   0.819960585  0.819960585
## 73   0.806212763  0.806212763
## 74   0.792639723  0.792639723
## 75   0.763302863  0.763302863
## 76   0.760142897  0.760142897
## 77   0.755002618  0.755002618
## 78   0.747326775  0.747326775
## 79   0.744232613  0.744232613
## 80   0.728246273  0.728246273
## 81   0.723669497  0.723669497
## 82   0.702216614  0.702216614
## 83   0.690062583  0.690062583
## 84   0.669937810  0.669937810
## 85   0.667490348  0.667490348
## 86   0.661027657  0.661027657
## 87   0.643138656  0.643138656
## 88   0.618211617  0.618211617
## 89   0.610817313  0.610817313
## 90   0.589197017  0.589197017
## 91   0.587366311  0.587366311
## 92   0.583663095  0.583663095
## 93   0.577949425  0.577949425
## 94   0.573046753  0.573046753
## 95   0.566433267  0.566433267
## 96   0.565587930  0.565587930
## 97   0.559578951  0.559578951
## 98   0.539798469  0.539798469
## 99   0.537946590  0.537946590
## 100  0.537551701  0.537551701
## 101  0.532662655  0.532662655
## 102  0.509958624  0.509958624
## 103  0.509054140  0.509054140
## 104  0.505455896  0.505455896
## 105  0.492511771  0.492511771
## 106  0.485817392  0.485817392
## 107  0.478387827  0.478387827
## 108  0.469852908  0.469852908
## 109  0.444093087  0.444093087
## 110  0.437987059  0.437987059
## 111  0.415302206  0.415302206
## 112  0.411021176  0.411021176
## 113  0.408269264  0.408269264
## 114  0.406880426  0.406880426
## 115  0.400542338  0.400542338
## 116  0.390068634  0.390068634
## 117  0.387685469  0.387685469
## 118  0.387481734  0.387481734
## 119  0.361215193  0.361215193
## 120  0.358153501  0.358153501
## 121  0.307375469  0.307375469
## 122  0.293666777  0.293666777
## 123  0.291403764  0.291403764
## 124  0.288471183  0.288471183
## 125  0.286170617  0.286170617
## 126  0.239529380  0.239529380
## 127  0.233755630  0.233755630
## 128  0.221776182  0.221776182
## 129  0.213188110  0.213188110
## 130  0.211372756  0.211372756
## 131  0.211217519  0.211217519
## 132  0.186761192  0.186761192
## 133  0.183171051  0.183171051
## 134  0.180132246  0.180132246
## 135  0.172914035  0.172914035
## 136  0.166987894  0.166987894
## 137  0.146918622  0.146918622
## 138  0.145063005  0.145063005
## 139  0.130793268  0.130793268
## 140  0.125674644  0.125674644
## 141  0.115258990  0.115258990
## 142  0.095195147  0.095195147
## 143  0.091998462  0.091998462
## 144  0.079423605  0.079423605
## 145  0.075833464  0.075833464
## 146  0.022686743  0.022686743
## 147  0.022620794  0.022620794
## 148  0.010325866  0.010325866
## 149  0.010251760  0.010251760
## 150  0.006222024  0.006222024
## 151 -0.010042328 -0.010042328
## 152 -0.013721541 -0.013721541
## 153 -0.017831099 -0.017831099
## 154 -0.021750211 -0.021750211
## 155 -0.034440133 -0.034440133
## 156 -0.037123484 -0.037123484
## 157 -0.041679024 -0.041679024
## 158 -0.063094435 -0.063094435
## 159 -0.072614594 -0.072614594
## 160 -0.075305781 -0.075305781
## 161 -0.085821194 -0.085821194
## 162 -0.087130797 -0.087130797
## 163 -0.106351770 -0.106351770
## 164 -0.127435365 -0.127435365
## 165 -0.134464680 -0.134464680
## 166 -0.138486499 -0.138486499
## 167 -0.139002675 -0.139002675
## 168 -0.145785185 -0.145785185
## 169 -0.162462431 -0.162462431
## 170 -0.178756896 -0.178756896
## 171 -0.184014338 -0.184014338
## 172 -0.190219175 -0.190219175
## 173 -0.202656700 -0.202656700
## 174 -0.202926904 -0.202926904
## 175 -0.208182596 -0.208182596
## 176 -0.220296270 -0.220296270
## 177 -0.228990586 -0.228990586
## 178 -0.242476288 -0.242476288
## 179 -0.242590459 -0.242590459
## 180 -0.242651846 -0.242651846
## 181 -0.263032058 -0.263032058
## 182 -0.263742708 -0.263742708
## 183 -0.266392861 -0.266392861
## 184 -0.266558985 -0.266558985
## 185 -0.279512410 -0.279512410
## 186 -0.281883321 -0.281883321
## 187 -0.282928967 -0.282928967
## 188 -0.285260165 -0.285260165
## 189 -0.291269033 -0.291269033
## 190 -0.315359457 -0.315359457
## 191 -0.320776884 -0.320776884
## 192 -0.327496934 -0.327496934
## 193 -0.342314922 -0.342314922
## 194 -0.342657697 -0.342657697
## 195 -0.366684407 -0.366684407
## 196 -0.378203882 -0.378203882
## 197 -0.379477040 -0.379477040
## 198 -0.380345752 -0.380345752
## 199 -0.382976411 -0.382976411
## 200 -0.399995147 -0.399995147
## 201 -0.401528713 -0.401528713
## 202 -0.413727267 -0.413727267
## 203 -0.413768746 -0.413768746
## 204 -0.415569821 -0.415569821
## 205 -0.433159928 -0.433159928
## 206 -0.434873841 -0.434873841
## 207 -0.434924994 -0.434924994
## 208 -0.438649070 -0.438649070
## 209 -0.442510387 -0.442510387
## 210 -0.463284299 -0.463284299
## 211 -0.464975169 -0.464975169
## 212 -0.466686909 -0.466686909
## 213 -0.468245821 -0.468245821
## 214 -0.469150518 -0.469150518
## 215 -0.473315789 -0.473315789
## 216 -0.483654926 -0.483654926
## 217 -0.509985886 -0.509985886
## 218 -0.510552889 -0.510552889
## 219 -0.511859764 -0.511859764
## 220 -0.529420048 -0.529420048
## 221 -0.531878691 -0.531878691
## 222 -0.536198581 -0.536198581
## 223 -0.542616629 -0.542616629
## 224 -0.548428047 -0.548428047
## 225 -0.551850209 -0.551850209
## 226 -0.564901552 -0.564901552
## 227 -0.605042902 -0.605042902
## 228 -0.632975363 -0.632975363
## 229 -0.653885365 -0.653885365
## 230 -0.704532548 -0.704532548
## 231 -0.714469499 -0.714469499
## 232 -0.747202985 -0.747202985
## 233 -0.797858712 -0.797858712
## 234 -0.816629515 -0.816629515
## 235 -0.873672977 -0.873672977
## 236 -0.876482677 -0.876482677
## 237 -0.879192661 -0.879192661
## 238 -0.915345179 -0.915345179
## 239 -0.925637731 -0.925637731
## 240 -1.015511412 -1.015511412
## 241 -1.080635139 -1.080635139
## 242 -1.184034820 -1.184034820
## 243 -1.214699287 -1.214699287
## 244 -1.335206678 -1.335206678
## 245 -1.459201747 -1.459201747
## 246 -1.505647738 -1.505647738
## 247 -1.559796853 -1.559796853
## 248 -1.668414703 -1.668414703
## 249 -1.810593575 -1.810593575
## 250 -1.870262044 -1.870262044
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))


  print(importance_rf_model_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")


  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")


  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")


  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Mean_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## Area under the curve: 0.7016

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    
    FeatEval_Mean_rf_AUC <- mean_auc
}
print(FeatEval_Mean_rf_AUC)
## Area under the curve: 0.7016

9.1.6. SVM

9.1.6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 454 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 364, 363, 363 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.8259829  0.6256205
##   0.50  0.8215629  0.6175088
##   1.00  0.8304029  0.6320648
## 
## Tuning parameter 'sigma' was held constant at a value of 0.002022917
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.002022917 and C = 1.
print(svm_model$bestTune)
##         sigma C
## 3 0.002022917 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8259829
FeatEval_Mean_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Mean_mean_accuracy_cv_svm)
## [1] 0.8259829
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.980176211453744"
FeatEval_Mean_svm_trainAccuracy <- train_accuracy
print(FeatEval_Mean_svm_trainAccuracy)
## [1] 0.9801762
predictions <- predict(svm_model, newdata = test_data_SVM1)

cm_FeatEval_Mean_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Mean_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 111  12
##         CN  17  54
##                                           
##                Accuracy : 0.8505          
##                  95% CI : (0.7924, 0.8975)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 1.742e-09       
##                                           
##                   Kappa : 0.673           
##                                           
##  Mcnemar's Test P-Value : 0.4576          
##                                           
##             Sensitivity : 0.8672          
##             Specificity : 0.8182          
##          Pos Pred Value : 0.9024          
##          Neg Pred Value : 0.7606          
##              Prevalence : 0.6598          
##          Detection Rate : 0.5722          
##    Detection Prevalence : 0.6340          
##       Balanced Accuracy : 0.8427          
##                                           
##        'Positive' Class : CI              
## 
cm_FeatEval_Mean_svm_Accuracy <- cm_FeatEval_Mean_svm$overall["Accuracy"]
cm_FeatEval_Mean_svm_Kappa <- cm_FeatEval_Mean_svm$overall["Kappa"]
print(cm_FeatEval_Mean_svm_Accuracy)
##  Accuracy 
## 0.8505155
print(cm_FeatEval_Mean_svm_Kappa)
##    Kappa 
## 0.673021

Let’s take a look of the feature importance of the model trained.

library(iml)

predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 648 rows and 251 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1 cg26081710     1.0315789   1.105263      1.126316        0.06481481
## 2 cg11227702     1.0368421   1.078947      1.105263        0.06327160
## 3 cg15535896     0.9684211   1.078947      1.121053        0.06327160
## 4 cg27160885     1.0315789   1.078947      1.105263        0.06327160
## 5 cg07158503     1.0263158   1.052632      1.073684        0.06172840
## 6 cg11331837     0.9578947   1.052632      1.126316        0.06172840
plot(importance_SVM)

library(vip)

vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value)
  FeatEval_Mean_svm_AUC <- auc_value
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value)
  FeatEval_Mean_svm_AUC <- auc_value
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value)
  FeatEval_Mean_svm_AUC <- auc_value
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(test_data_SVM1$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (test_data_SVM1$DX CN) < 128 cases (test_data_SVM1$DX CI).
## Area under the curve: 0.9421
## [1] "The auc vlue is:"
## Area under the curve: 0.9421

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Mean_svm_AUC <- mean_auc
    
}
print(FeatEval_Mean_svm_AUC)
## Area under the curve: 0.9421

9.2 Selected Based on Median

9.2.1 Input Feature For Evaluation

Performance of the selected output features based on Median

processed_dataFrame<-df_selected_Median
processed_data<-output_median_feature

AfterProcess_FeatureName<-Selected_median_imp_Name
print(head(output_median_feature))
## # A tibble: 6 × 251
##   DX    cg23432430      PC3 age.now      PC1        PC2 cg07158503 cg00962106 cg07634717 cg06697310 cg14168080 cg03660162 cg02225060 cg07504457 cg10701746 cg20678988 cg09015880 cg19799454 cg00004073
##   <fct>      <dbl>    <dbl>   <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CI         0.948 -0.0140     82.4 -0.214    0.0147         0.578      0.912      0.748      0.845      0.419      0.869      0.683      0.712      0.480      0.844      0.510      0.918     0.0293
## 2 CN         0.946  0.00506    78.6 -0.173    0.0575         0.620      0.538      0.825      0.865      0.442      0.516      0.827      0.685      0.487      0.855      0.840      0.911     0.0279
## 3 CN         0.942  0.0291     80.4 -0.00367  0.0837         0.624      0.504      0.818      0.241      0.436      0.903      0.521      0.721      0.493      0.779      0.847      0.907     0.646 
## 4 CI         0.943 -0.0323     78.2 -0.187   -0.0112         0.599      0.904      0.758      0.848      0.957      0.531      0.808      0.187      0.855      0.826      0.487      0.922     0.624 
## 5 CI         0.946  0.0529     62.9  0.0268   0.0000165      0.631      0.896      0.826      0.821      0.946      0.926      0.608      0.235      0.488      0.330      0.889      0.914     0.412 
## 6 CN         0.951 -0.00869    80.7 -0.0379   0.0157         0.615      0.886      0.210      0.784      0.399      0.894      0.764      0.730      0.842      0.854      0.906      0.921     0.393 
## # ℹ 232 more variables: cg00154902 <dbl>, cg02887598 <dbl>, cg09727210 <dbl>, cg11227702 <dbl>, cg11331837 <dbl>, cg16338321 <dbl>, cg24851651 <dbl>, cg25208881 <dbl>, cg19503462 <dbl>,
## #   cg03749159 <dbl>, cg03088219 <dbl>, cg26081710 <dbl>, cg09120722 <dbl>, cg11787167 <dbl>, cg12543766 <dbl>, cg19471911 <dbl>, cg11540596 <dbl>, cg01921484 <dbl>, cg00415024 <dbl>,
## #   cg12689021 <dbl>, cg21757617 <dbl>, cg01128042 <dbl>, cg17002719 <dbl>, cg16715186 <dbl>, cg05234269 <dbl>, cg12421087 <dbl>, cg05064044 <dbl>, cg15184869 <dbl>, cg23517115 <dbl>,
## #   cg00819121 <dbl>, cg11019791 <dbl>, cg04156077 <dbl>, cg01910713 <dbl>, cg16779438 <dbl>, cg25169289 <dbl>, cg03979311 <dbl>, cg14710850 <dbl>, cg00648024 <dbl>, cg25712921 <dbl>,
## #   cg27272246 <dbl>, cg18816397 <dbl>, cg18285382 <dbl>, cg08096656 <dbl>, cg15535896 <dbl>, cg13573375 <dbl>, cg20673830 <dbl>, cg26853071 <dbl>, cg15600437 <dbl>, cg16431720 <dbl>,
## #   cg25436480 <dbl>, cg27577781 <dbl>, cg06277607 <dbl>, cg08745107 <dbl>, cg03982462 <dbl>, cg25879395 <dbl>, cg20823859 <dbl>, cg06960717 <dbl>, cg06961873 <dbl>, cg10738648 <dbl>,
## #   cg20685672 <dbl>, cg09584650 <dbl>, cg07640670 <dbl>, cg12702014 <dbl>, cg16858433 <dbl>, cg00512739 <dbl>, cg15098922 <dbl>, cg26679884 <dbl>, cg16536985 <dbl>, cg24883219 <dbl>, …
print(Selected_median_imp_Name)
##   [1] "cg23432430" "PC3"        "age.now"    "PC1"        "PC2"        "cg07158503" "cg00962106" "cg07634717" "cg06697310" "cg14168080" "cg03660162" "cg02225060" "cg07504457" "cg10701746" "cg20678988"
##  [16] "cg09015880" "cg19799454" "cg00004073" "cg00154902" "cg02887598" "cg09727210" "cg11227702" "cg11331837" "cg16338321" "cg24851651" "cg25208881" "cg19503462" "cg03749159" "cg03088219" "cg26081710"
##  [31] "cg09120722" "cg11787167" "cg12543766" "cg19471911" "cg11540596" "cg01921484" "cg00415024" "cg12689021" "cg21757617" "cg01128042" "cg17002719" "cg16715186" "cg05234269" "cg12421087" "cg05064044"
##  [46] "cg15184869" "cg23517115" "cg00819121" "cg11019791" "cg04156077" "cg01910713" "cg16779438" "cg25169289" "cg03979311" "cg14710850" "cg00648024" "cg25712921" "cg27272246" "cg18816397" "cg18285382"
##  [61] "cg08096656" "cg15535896" "cg13573375" "cg20673830" "cg26853071" "cg15600437" "cg16431720" "cg25436480" "cg27577781" "cg06277607" "cg08745107" "cg03982462" "cg25879395" "cg20823859" "cg06960717"
##  [76] "cg06961873" "cg10738648" "cg20685672" "cg09584650" "cg07640670" "cg12702014" "cg16858433" "cg00512739" "cg15098922" "cg26679884" "cg16536985" "cg24883219" "cg05876883" "cg06371647" "cg02823329"
##  [91] "cg12556569" "cg22666875" "cg13387643" "cg09216282" "cg02078724" "cg15700429" "cg17429539" "cg08584917" "cg01608425" "cg08788093" "cg22542451" "cg00084271" "cg21697769" "cg05593887" "cg18918831"
## [106] "cg08198851" "cg22931151" "cg18857647" "cg18150287" "cg00939409" "cg01008088" "cg17723206" "cg05321907" "cg12776173" "cg02932958" "cg09247979" "cg14170504" "cg25306893" "cg25758034" "cg25649515"
## [121] "cg22305850" "cg13405878" "cg14687298" "cg12240569" "cg19301366" "cg05161773" "cg11133939" "cg01933473" "cg26983017" "cg24697433" "cg18993517" "cg02122327" "cg11706829" "cg17906851" "cg17386240"
## [136] "cg15633912" "cg16571124" "cg03549208" "cg02495179" "cg06880438" "cg10681981" "cg13739190" "cg09785377" "cg11438323" "cg22071943" "cg26846609" "cg24634455" "cg01280698" "cg06833284" "cg02668233"
## [151] "cg04831745" "cg00322003" "cg01662749" "cg24307368" "cg04497611" "cg00146240" "cg00696044" "cg02627240" "cg03672288" "cg03737947" "cg04316537" "cg06118351" "cg06403901" "cg06483046" "cg06864789"
## [166] "cg07138269" "cg08554146" "cg08857872" "cg10240127" "cg11187460" "cg11286989" "cg11314779" "cg12228670" "cg13372276" "cg13653328" "cg14293999" "cg14532717" "cg14780448" "cg15730644" "cg15985500"
## [181] "cg17002338" "cg17042243" "cg17738613" "cg18819889" "cg18949721" "cg21986118" "cg23066280" "cg23916408" "cg24139837" "cg25277809" "cg27160885" "cg05392160" "cg02631626" "cg23352245" "cg21139150"
## [196] "cg04124201" "cg10666341" "cg18339359" "cg22169467" "cg04888234" "cg25059696" "cg06715136" "cg03600007" "cg10091792" "cg14192979" "cg20078646" "cg27224751" "cg04412904" "cg17129965" "cg14507637"
## [211] "cg14307563" "cg20981163" "cg22535849" "cg18029737" "cg14627380" "cg10788927" "cg08041188" "cg13226272" "cg11247378" "cg02772171" "cg04462915" "cg03221390" "cg22112152" "cg04664583" "cg20803293"
## [226] "cg09451339" "cg16733676" "cg22741595" "cg04242342" "cg00295418" "cg06012903" "cg00345083" "cg10039445" "cg13368637" "cg04718469" "cg16089727" "cg06231502" "cg02550738" "cg05850457" "cg08896901"
## [241] "cg17268094" "cg01549082" "cg12146221" "cg06394820" "cg26901661" "cg12784167" "cg13815695" "cg01462799" "cg00322820" "cg02356645"
print(head(df_selected_Median))
##                     DX cg23432430          PC3 age.now          PC1        PC2 cg07158503 cg00962106 cg07634717 cg06697310 cg14168080 cg03660162 cg02225060 cg07504457 cg10701746 cg20678988 cg09015880
## 200223270003_R02C01 CI  0.9482702 -0.014043316    82.4 -0.214185447 0.01470293  0.5777146  0.9124898  0.7483382  0.8454609  0.4190123  0.8691767  0.6828159  0.7116230  0.4795503  0.8438718  0.5101716
## 200223270003_R03C01 CN  0.9455418  0.005055871    78.6 -0.172761185 0.05745834  0.6203543  0.5375751  0.8254434  0.8653044  0.4420256  0.5160770  0.8265195  0.6854539  0.4868342  0.8548886  0.8402106
## 200223270003_R06C01 CN  0.9418716  0.029143653    80.4 -0.003667305 0.08372861  0.6236025  0.5040948  0.8181246  0.2405168  0.4355521  0.9026304  0.5209552  0.7205633  0.4927257  0.7786685  0.8472063
##                     cg19799454 cg00004073 cg00154902 cg02887598 cg09727210 cg11227702 cg11331837 cg16338321 cg24851651 cg25208881 cg19503462 cg03749159  cg03088219 cg26081710 cg09120722 cg11787167
## 200223270003_R02C01  0.9178930 0.02928535  0.5137741 0.04020908  0.4240111 0.86486075 0.03692842  0.5350242 0.03674702  0.1851956  0.7951675  0.9355921 0.844002862  0.8751040  0.5878977 0.03853894
## 200223270003_R03C01  0.9106247 0.02787198  0.8540746 0.67073881  0.8812928 0.49184121 0.57150125  0.8294062 0.05358297  0.9092286  0.4537684  0.9153921 0.007435243  0.9198212  0.8287506 0.04673831
## 200223270003_R06C01  0.9066551 0.64576463  0.8188126 0.73408417  0.8493743 0.02543724 0.03182862  0.4918708 0.05968923  0.9265502  0.6997359  0.9255807 0.120155222  0.8801892  0.8793344 0.32564508
##                     cg12543766 cg19471911 cg11540596 cg01921484 cg00415024 cg12689021 cg21757617 cg01128042 cg17002719 cg16715186 cg05234269 cg12421087 cg05064044 cg15184869 cg23517115 cg00819121
## 200223270003_R02C01 0.51028134  0.6334393  0.9238951  0.9098550  0.4299553  0.7706828 0.03652647  0.9113420 0.04939181  0.2742789 0.93848584  0.5647607  0.5672851  0.8622328  0.2151144  0.9207001
## 200223270003_R03C01 0.88741539  0.8437175  0.8926595  0.9093137  0.3999122  0.7449475 0.44299089  0.5328806 0.40466475  0.7946153 0.57461229  0.5399655  0.5358875  0.8996252  0.9131440  0.9281472
## 200223270003_R06C01 0.02818501  0.6127952  0.8820252  0.9204487  0.7465084  0.7872237 0.44725379  0.5222757 0.51428089  0.8124316 0.02467208  0.5400348  0.5273964  0.8688117  0.8328364  0.9327211
##                     cg11019791 cg04156077 cg01910713 cg16779438 cg25169289 cg03979311 cg14710850 cg00648024 cg25712921 cg27272246 cg18816397 cg18285382 cg08096656 cg15535896 cg13573375 cg20673830
## 200223270003_R02C01  0.8112324  0.7321883  0.8573169  0.8826150  0.1100884 0.86644909  0.8048592 0.51410972  0.2829848  0.8615873  0.5472925  0.3202927  0.9362594  0.3382952  0.8670419  0.2422052
## 200223270003_R03C01  0.7831231  0.6865805  0.8538850  0.5466924  0.7667174 0.06199853  0.8090950 0.40202875  0.6220919  0.8705287  0.4940355  0.2930577  0.9314878  0.9253926  0.1733934  0.6881735
## 200223270003_R06C01  0.4353250  0.8501188  0.8110366  0.8629492  0.2264993 0.72615553  0.8285902 0.05579011  0.6384003  0.8103777  0.5337018  0.8923595  0.4943033  0.3320191  0.8888246  0.2134634
##                     cg26853071 cg15600437 cg16431720 cg25436480 cg27577781 cg06277607 cg08745107 cg03982462 cg25879395 cg20823859 cg06960717 cg06961873 cg10738648 cg20685672 cg09584650 cg07640670
## 200223270003_R02C01  0.4233820  0.4885353  0.7356099  0.8425160  0.8143535 0.10744587 0.02921338  0.8562777 0.88130864  0.9030711  0.7030978  0.5335591 0.44931577  0.6712101 0.08230254 0.58296513
## 200223270003_R03C01  0.7451354  0.4894487  0.8692449  0.4994032  0.8113185 0.09353494 0.78542320  0.6023731 0.02603438  0.6062985  0.7653402  0.5472606 0.49894016  0.7932091 0.09661586 0.55225610
## 200223270003_R06C01  0.4228079  0.8551374  0.8773137  0.3494312  0.8144274 0.09504696 0.02709928  0.8778458 0.91060615  0.8917348  0.7206218  0.9415177 0.05552024  0.6613646 0.52399749 0.04058533
##                     cg12702014 cg16858433 cg00512739 cg15098922 cg26679884 cg16536985 cg24883219 cg05876883 cg06371647 cg02823329 cg12556569 cg22666875 cg13387643 cg09216282 cg02078724 cg15700429
## 200223270003_R02C01  0.7704049  0.9184356  0.9337648  0.9286092  0.6793815  0.5789643  0.6430473  0.9039064  0.8336894  0.9462397 0.06218231  0.8177182  0.4229959  0.9349248  0.3096774  0.7879010
## 200223270003_R03C01  0.7848681  0.9194211  0.8863895  0.9027517  0.1848705  0.5418687  0.6822115  0.9223308  0.8198684  0.6464005 0.03924599  0.8291957  0.4200273  0.9244259  0.2896133  0.9114530
## 200223270003_R06C01  0.8065993  0.9271632  0.9242748  0.8525611  0.1701734  0.8392044  0.5296903  0.4697980  0.8069537  0.9633930 0.48636893  0.3694180  0.4161488  0.9263996  0.2805612  0.8838233
##                     cg17429539 cg08584917 cg01608425 cg08788093 cg22542451 cg00084271 cg21697769 cg05593887 cg18918831 cg08198851 cg22931151 cg18857647 cg18150287 cg00939409 cg01008088 cg17723206
## 200223270003_R02C01  0.7860900  0.5663205  0.9030410 0.03911678  0.5884356  0.8103611  0.8946108  0.5939220  0.4891660  0.6578905  0.9311023  0.8582332  0.7685695  0.2652180  0.8424817 0.92881042
## 200223270003_R03C01  0.7100923  0.9019732  0.9264388 0.60934160  0.8337068  0.7877006  0.2822953  0.5766550  0.5333801  0.6578186  0.9356702  0.8394132  0.7519166  0.8882671  0.2417656 0.48556255
## 200223270003_R06C01  0.7660838  0.9187789  0.8887753 0.88380243  0.8125084  0.7706165  0.8698740  0.9148338  0.6406575  0.1272153  0.9328614  0.2647491  0.2501173  0.8842646  0.2618620 0.01765023
##                     cg05321907 cg12776173 cg02932958 cg09247979 cg14170504 cg25306893 cg25758034 cg25649515 cg22305850 cg13405878 cg14687298 cg12240569 cg19301366 cg05161773 cg11133939 cg01933473
## 200223270003_R02C01  0.2880477  0.1038804  0.7901008  0.5070956 0.54915621  0.6265392  0.6114028  0.9279829 0.03361934  0.4549662 0.04206702 0.82772064  0.8831393  0.4120912  0.1282694  0.2589014
## 200223270003_R03C01  0.1782629  0.8730635  0.4210489  0.5706177 0.02236650  0.8330282  0.6649219  0.9235753 0.57522232  0.7858042 0.14813581 0.02690547  0.8072679  0.4154907  0.5920898  0.6726133
## 200223270003_R06C01  0.8427929  0.7009491  0.3825995  0.5090215 0.02988245  0.6175380  0.2393844  0.5895839 0.58548744  0.7583938 0.24260002 0.46030640  0.8796022  0.8526849  0.5127706  0.2642560
##                     cg26983017 cg24697433 cg18993517 cg02122327 cg11706829 cg17906851 cg17386240 cg15633912 cg16571124 cg03549208 cg02495179 cg06880438 cg10681981 cg13739190 cg09785377 cg11438323
## 200223270003_R02C01 0.89868232  0.9243095  0.2091538 0.38940091  0.8897234  0.9488392  0.7473400  0.1605530  0.9282854  0.9014487  0.6813307  0.8285145  0.7035090  0.8510103  0.9162088  0.4863471
## 200223270003_R03C01 0.03145466  0.6808390  0.2665896 0.37769608  0.5444785  0.9529718  0.7144809  0.9333421  0.9206431  0.8381784  0.7373055  0.7988881  0.7382662  0.8358482  0.9226292  0.8984559
## 200223270003_R06C01 0.84677625  0.6384606  0.2574003 0.04017909  0.5669449  0.6462151  0.8074824  0.8737362  0.9276842  0.9097817  0.5588114  0.7839538  0.6971989  0.8419471  0.6405193  0.8722772
##                     cg22071943 cg26846609 cg24634455 cg01280698 cg06833284 cg02668233 cg04831745 cg00322003 cg01662749 cg24307368 cg04497611 cg00146240 cg00696044 cg02627240 cg03672288 cg03737947
## 200223270003_R02C01  0.8705217 0.48860949  0.7796391  0.8985067  0.9125144  0.4708431 0.61984995  0.1759911  0.3506201 0.64323677  0.9086359  0.6336151 0.55608424 0.66706843  0.9235592 0.91824910
## 200223270003_R03C01  0.2442648 0.04878986  0.5188241  0.8846201  0.9003482  0.8841930 0.71214149  0.5702070  0.2510946 0.34980461  0.8818513  0.8957183 0.07552381 0.57129408  0.6718625 0.92067153
## 200223270003_R06C01  0.2644581 0.48026945  0.5325725  0.8847132  0.6097933  0.4575646 0.06871768  0.3077122  0.8061480 0.02720398  0.5853116  0.1433218 0.79270858 0.05309659  0.9007629 0.03638091
##                     cg04316537 cg06118351 cg06403901 cg06483046 cg06864789 cg07138269 cg08554146 cg08857872 cg10240127 cg11187460 cg11286989 cg11314779 cg12228670 cg13372276 cg13653328 cg14293999
## 200223270003_R02C01  0.8074830  0.3633940 0.92790690 0.04383925 0.05369415  0.5002290  0.8982080  0.3395280  0.9250553 0.03672179  0.7590008  0.0242134  0.8632174 0.04888111  0.9245434  0.2836710
## 200223270003_R03C01  0.8453340  0.4714860 0.04783341 0.50720277 0.46053125  0.9426707  0.8963074  0.8181845  0.9403255 0.92516409  0.8533989  0.8966100  0.8496212 0.62396373  0.5122938  0.9172023
## 200223270003_R06C01  0.4351695  0.8655962 0.05253626 0.89604910 0.87513655  0.5057781  0.8213878  0.2970779  0.9056974 0.03109553  0.7313884  0.8908661  0.8738949 0.59693465  0.9362798  0.9168166
##                     cg14532717 cg14780448 cg15730644 cg15985500 cg17002338 cg17042243 cg17738613 cg18819889 cg18949721 cg21986118 cg23066280 cg23916408 cg24139837 cg25277809 cg27160885 cg05392160
## 200223270003_R02C01  0.5732280  0.9119141  0.4803181  0.8555262  0.9286251  0.2502905  0.6879612  0.9156157  0.2334245  0.6658175 0.07247841  0.1942275 0.07404605  0.1632342  0.2231606  0.9328933
## 200223270003_R03C01  0.1107638  0.6702102  0.4353906  0.8312198  0.2684163  0.2933475  0.6582258  0.9004455  0.2437792  0.6571296 0.57174588  0.9154993 0.04183445  0.4913711  0.8263885  0.2576881
## 200223270003_R06C01  0.6273416  0.6207355  0.8763048  0.8492103  0.2811103  0.2725457  0.1022257  0.9054439  0.2523095  0.7034445 0.80814756  0.8886255 0.05657120  0.5952124  0.2121179  0.8920726
##                     cg02631626 cg23352245 cg21139150 cg04124201 cg10666341 cg18339359 cg22169467 cg04888234 cg25059696 cg06715136 cg03600007 cg10091792 cg14192979 cg20078646 cg27224751 cg04412904
## 200223270003_R02C01  0.6280766  0.9377232 0.01853264  0.8686421  0.9046648  0.8824858  0.3095010  0.8379655  0.9017504  0.3400192  0.5658487  0.8670733 0.06336040 0.06198170 0.44503947 0.05088595
## 200223270003_R03C01  0.1951736  0.9375774 0.43223243  0.3308589  0.6731062  0.9040272  0.2978585  0.4376314  0.3047156  0.9259109  0.6018832  0.5864221 0.06019651 0.89537412 0.03214912 0.07717659
## 200223270003_R06C01  0.2699849  0.5932742 0.43772680  0.3241613  0.6443180  0.8552121  0.8955853  0.8039047  0.3051179  0.9079807  0.8611166  0.6087997 0.52114282 0.08725521 0.83123722 0.08253743
##                     cg17129965 cg14507637 cg14307563 cg20981163 cg22535849 cg18029737 cg14627380 cg10788927 cg08041188 cg13226272 cg11247378 cg02772171 cg04462915 cg03221390 cg22112152 cg04664583
## 200223270003_R02C01  0.8972140  0.9051258  0.1855966  0.8990628  0.8847704  0.9100454  0.9455369  0.8973154  0.7752456 0.02637249  0.1591185  0.9182018 0.03224861  0.5859063  0.8476101  0.5572814
## 200223270003_R03C01  0.8806673  0.9009460  0.8916957  0.9264076  0.8609966  0.9016634  0.9258964  0.2021398  0.3201255 0.54100016  0.7874849  0.5660559 0.50740695  0.9180706  0.8014136  0.5881190
## 200223270003_R06C01  0.8857237  0.9013686  0.8750052  0.4874651  0.8808022  0.7376586  0.5789898  0.2053075  0.7900939 0.44370701  0.4807942  0.8995479 0.02700644  0.6399867  0.7897897  0.9352717
##                     cg20803293 cg09451339 cg16733676 cg22741595 cg04242342 cg00295418 cg06012903 cg00345083 cg10039445 cg13368637 cg04718469 cg16089727 cg06231502 cg02550738 cg05850457 cg08896901
## 200223270003_R02C01 0.54933918  0.2243746  0.9057228  0.6525533  0.8206769 0.44954665  0.7964595 0.47960968  0.8833873  0.5597507  0.8687522 0.86748697  0.7784451  0.6201457  0.8183013  0.3581911
## 200223270003_R03C01 0.07935747  0.2340702  0.8904541  0.1730013  0.8167892 0.48471295  0.1933431 0.50833875  0.8954055  0.9100088  0.7256813 0.54996692  0.7964278  0.9011727  0.8313023  0.2467071
## 200223270003_R06C01 0.42191244  0.8921284  0.1698111  0.1550739  0.8040357 0.02004532  0.1960773 0.03929249  0.8832807  0.8739205  0.8521881 0.05876736  0.7706160  0.9085849  0.8161364  0.9225209
##                     cg17268094 cg01549082 cg12146221 cg06394820 cg26901661 cg12784167 cg13815695 cg01462799 cg00322820 cg02356645
## 200223270003_R02C01  0.5774753  0.2924138  0.2049284  0.8513195  0.8951971 0.81503498  0.9267057  0.8284427  0.4869764  0.5105903
## 200223270003_R03C01  0.9003262  0.7065693  0.1814927  0.8695521  0.8754981 0.02811410  0.6859729  0.4038824  0.4858988  0.5833923
## 200223270003_R06C01  0.8789368  0.2895440  0.8619250  0.4415020  0.9021064 0.03073269  0.6509046  0.4676821  0.4754313  0.5701428
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]

9.2.2. Logistic Regression Model

9.2.2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)

set.seed(123) 
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 454 251
dim(testData)
## [1] 194 251
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")

cm_FeatEval_Median_LRM1<-caret::confusionMatrix(predictions, testData$DX)

print(cm_FeatEval_Median_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 120  17
##         CN   8  49
##                                           
##                Accuracy : 0.8711          
##                  95% CI : (0.8157, 0.9148)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 1.656e-11       
##                                           
##                   Kappa : 0.7031          
##                                           
##  Mcnemar's Test P-Value : 0.1096          
##                                           
##             Sensitivity : 0.9375          
##             Specificity : 0.7424          
##          Pos Pred Value : 0.8759          
##          Neg Pred Value : 0.8596          
##              Prevalence : 0.6598          
##          Detection Rate : 0.6186          
##    Detection Prevalence : 0.7062          
##       Balanced Accuracy : 0.8400          
##                                           
##        'Positive' Class : CI              
## 
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Median_LRM1_Accuracy <- cm_FeatEval_Median_LRM1$overall["Accuracy"]
cm_FeatEval_Median_LRM1_Kappa <- cm_FeatEval_Median_LRM1$overall["Kappa"]

print(cm_FeatEval_Median_LRM1_Accuracy)
## Accuracy 
## 0.871134
print(cm_FeatEval_Median_LRM1_Kappa)
##    Kappa 
## 0.703146
print(model_LRM1)
## glmnet 
## 
## 454 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0001769938  0.7994628  0.5470726
##   0.10   0.0017699384  0.8060806  0.5561292
##   0.10   0.0176993845  0.8193162  0.5843843
##   0.55   0.0001769938  0.7884982  0.5192244
##   0.55   0.0017699384  0.7818803  0.5017078
##   0.55   0.0176993845  0.7289621  0.3714212
##   1.00   0.0001769938  0.7686935  0.4709982
##   1.00   0.0017699384  0.7488889  0.4371236
##   1.00   0.0176993845  0.6717460  0.2242205
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01769938.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)

FeatEval_Median_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.997797356828194"
print(FeatEval_Median_LRM1_trainAccuracy)
## [1] 0.9977974
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.7681699
FeatEval_Median_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Median_mean_accuracy_cv_LRM1)
## [1] 0.7681699
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_LRM1_AUC <-auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_LRM1_AUC <-auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_LRM1_AUC <-auc_value
  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9092
## [1] "The auc value is:"
## Area under the curve: 0.9092

if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_LRM1_AUC <-mean_auc
}
print(FeatEval_Median_LRM1_AUC)
## Area under the curve: 0.9092
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC3         100.00
## PC1          75.12
## PC2          58.87
## cg23432430   57.15
## cg09727210   54.10
## cg07158503   46.44
## cg00962106   45.17
## cg06697310   42.80
## cg10701746   39.69
## cg09015880   38.90
## cg16338321   38.30
## cg00819121   38.05
## cg02225060   37.05
## cg00415024   36.20
## cg26081710   36.12
## cg21757617   35.91
## cg14168080   35.75
## cg05064044   34.73
## cg02887598   33.32
## cg00004073   32.12
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
##        Overall
## 1   3.18842587
## 2   2.39520051
## 3   1.87706979
## 4   1.82208262
## 5   1.72486003
## 6   1.48063346
## 7   1.44023001
## 8   1.36480429
## 9   1.26563567
## 10  1.24035703
## 11  1.22110887
## 12  1.21305309
## 13  1.18147057
## 14  1.15430972
## 15  1.15152197
## 16  1.14497407
## 17  1.13975248
## 18  1.10722616
## 19  1.06235813
## 20  1.02420300
## 21  1.02185532
## 22  1.01652980
## 23  1.01169312
## 24  0.99959858
## 25  0.99167486
## 26  0.97320357
## 27  0.96863996
## 28  0.96137801
## 29  0.95340981
## 30  0.94516230
## 31  0.94444136
## 32  0.94158925
## 33  0.93561012
## 34  0.93082797
## 35  0.91401062
## 36  0.90745296
## 37  0.90243919
## 38  0.90130081
## 39  0.90117310
## 40  0.89944471
## 41  0.88338684
## 42  0.87896114
## 43  0.87442980
## 44  0.87404942
## 45  0.86190645
## 46  0.85956601
## 47  0.85931524
## 48  0.84726874
## 49  0.84712594
## 50  0.80542049
## 51  0.77666168
## 52  0.77658413
## 53  0.77550563
## 54  0.76983886
## 55  0.76826044
## 56  0.76648581
## 57  0.75910543
## 58  0.75033770
## 59  0.75015499
## 60  0.74765138
## 61  0.74701445
## 62  0.74459931
## 63  0.74292730
## 64  0.73194816
## 65  0.73154206
## 66  0.72943791
## 67  0.72752233
## 68  0.72545029
## 69  0.72530308
## 70  0.71913220
## 71  0.71725145
## 72  0.71325216
## 73  0.71183026
## 74  0.70764556
## 75  0.70437740
## 76  0.70180753
## 77  0.69872001
## 78  0.69818706
## 79  0.68824150
## 80  0.67720915
## 81  0.67693328
## 82  0.67023307
## 83  0.66811642
## 84  0.65461716
## 85  0.64734474
## 86  0.64680347
## 87  0.64387681
## 88  0.63333800
## 89  0.63256915
## 90  0.62861808
## 91  0.62825803
## 92  0.62459285
## 93  0.62387701
## 94  0.62262167
## 95  0.61525029
## 96  0.61173712
## 97  0.60855070
## 98  0.60805850
## 99  0.60228586
## 100 0.59991420
## 101 0.59799939
## 102 0.59625593
## 103 0.58635772
## 104 0.58299740
## 105 0.58222172
## 106 0.57878501
## 107 0.57732288
## 108 0.57357653
## 109 0.56954900
## 110 0.56790810
## 111 0.56096287
## 112 0.56055544
## 113 0.56015427
## 114 0.55346945
## 115 0.55291544
## 116 0.54912038
## 117 0.54188065
## 118 0.53934854
## 119 0.53721784
## 120 0.53164099
## 121 0.52988696
## 122 0.52278157
## 123 0.51772990
## 124 0.51234945
## 125 0.50135775
## 126 0.50048085
## 127 0.49493622
## 128 0.47392268
## 129 0.46924195
## 130 0.46811371
## 131 0.46345297
## 132 0.46248719
## 133 0.46198127
## 134 0.46121369
## 135 0.46044321
## 136 0.45933073
## 137 0.45287300
## 138 0.44969202
## 139 0.44910238
## 140 0.44539579
## 141 0.44441219
## 142 0.43846893
## 143 0.43281983
## 144 0.42764281
## 145 0.42090033
## 146 0.41791079
## 147 0.41610015
## 148 0.40447137
## 149 0.39487293
## 150 0.39477696
## 151 0.39285964
## 152 0.38840457
## 153 0.38704367
## 154 0.37749203
## 155 0.37684829
## 156 0.37550783
## 157 0.37283355
## 158 0.36932751
## 159 0.36926725
## 160 0.36435426
## 161 0.36237676
## 162 0.35967472
## 163 0.35706949
## 164 0.35243710
## 165 0.35081239
## 166 0.34597818
## 167 0.34363845
## 168 0.33604979
## 169 0.33421572
## 170 0.33225173
## 171 0.32909256
## 172 0.32887522
## 173 0.32686496
## 174 0.32677868
## 175 0.32560098
## 176 0.32312631
## 177 0.32177734
## 178 0.32046099
## 179 0.32035202
## 180 0.31386523
## 181 0.31116196
## 182 0.30746456
## 183 0.30434478
## 184 0.29286777
## 185 0.29003664
## 186 0.28938759
## 187 0.28550974
## 188 0.28451909
## 189 0.28349415
## 190 0.28348491
## 191 0.28205446
## 192 0.27800578
## 193 0.27236595
## 194 0.27086073
## 195 0.27047475
## 196 0.26995359
## 197 0.26797697
## 198 0.26198655
## 199 0.26148619
## 200 0.25997417
## 201 0.25974403
## 202 0.25343703
## 203 0.24248192
## 204 0.24091350
## 205 0.22950223
## 206 0.21893151
## 207 0.21500503
## 208 0.20840975
## 209 0.20814535
## 210 0.20792596
## 211 0.20398664
## 212 0.20220817
## 213 0.19352816
## 214 0.19162572
## 215 0.18799061
## 216 0.18733175
## 217 0.18423807
## 218 0.17916954
## 219 0.17718443
## 220 0.17569282
## 221 0.16183559
## 222 0.15690863
## 223 0.15021387
## 224 0.14174255
## 225 0.12645035
## 226 0.11784052
## 227 0.11465985
## 228 0.10090127
## 229 0.09003150
## 230 0.08787500
## 231 0.08696989
## 232 0.08581724
## 233 0.08463400
## 234 0.08178926
## 235 0.07610370
## 236 0.07492226
## 237 0.05059524
## 238 0.02964717
## 239 0.01765970
## 240 0.01508225
## 241 0.00000000
## 242 0.00000000
## 243 0.00000000
## 244 0.00000000
## 245 0.00000000
## 246 0.00000000
## 247 0.00000000
## 248 0.00000000
## 249 0.00000000
## 250 0.00000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}

9.2.2.2 Model Diagnose & Improve

9.2.2.2.1 Class imbalance
Class imbalance Check
  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##  CI  CN 
## 427 221
prop.table(table(df_LRM1$DX))
## 
##        CI        CN 
## 0.6589506 0.3410494
table(trainData$DX)
## 
##  CI  CN 
## 299 155
prop.table(table(trainData$DX))
## 
##        CI        CN 
## 0.6585903 0.3414097
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 1.932127
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 1.929032
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 65.488, df = 1, p-value = 5.848e-16
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 45.674, df = 1, p-value = 1.397e-11
Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)
library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##  CI  CN 
## 299 310
dim(balanced_data_LGR_1)
## [1] 609 251
Fit Model with Balanced Data
ctrl <- trainControl(method = "cv", number = 5)

model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 117  17
##         CN  11  49
##                                           
##                Accuracy : 0.8557          
##                  95% CI : (0.7982, 0.9019)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 5.787e-10       
##                                           
##                   Kappa : 0.6713          
##                                           
##  Mcnemar's Test P-Value : 0.3447          
##                                           
##             Sensitivity : 0.9141          
##             Specificity : 0.7424          
##          Pos Pred Value : 0.8731          
##          Neg Pred Value : 0.8167          
##              Prevalence : 0.6598          
##          Detection Rate : 0.6031          
##    Detection Prevalence : 0.6907          
##       Balanced Accuracy : 0.8282          
##                                           
##        'Positive' Class : CI              
## 
print(model_LRM2)
## glmnet 
## 
## 609 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 487, 488, 487, 487, 487 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       Accuracy   Kappa    
##   0.10   0.000194718  0.8998510  0.7993574
##   0.10   0.001947180  0.8998510  0.7993487
##   0.10   0.019471799  0.8916272  0.7829100
##   0.55   0.000194718  0.8817775  0.7630141
##   0.55   0.001947180  0.8719279  0.7432252
##   0.55   0.019471799  0.8424333  0.6840738
##   1.00   0.000194718  0.8637041  0.7266518
##   1.00   0.001947180  0.8637312  0.7268088
##   1.00   0.019471799  0.7865872  0.5721279
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.00194718.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8668323
importance_model_LRM2 <- varImp(model_LRM2)

print(importance_model_LRM2)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC3         100.00
## PC1          56.77
## PC2          38.47
## cg09727210   37.36
## cg23432430   34.79
## cg10701746   32.31
## cg07158503   31.85
## cg06697310   30.72
## cg09015880   29.49
## cg16338321   29.27
## cg00962106   28.49
## cg00819121   27.36
## cg26081710   27.20
## cg05064044   27.11
## cg00154902   26.79
## cg14168080   25.73
## cg01910713   24.94
## cg02225060   24.61
## cg21757617   24.46
## cg00415024   24.33
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4|| METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)

library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM2)  
  
}
##        Overall
## 1   9.39409905
## 2   5.33332095
## 3   3.61401060
## 4   3.50916676
## 5   3.26781163
## 6   3.03499550
## 7   2.99195070
## 8   2.88616468
## 9   2.76989824
## 10  2.74934125
## 11  2.67647712
## 12  2.56998941
## 13  2.55511336
## 14  2.54633955
## 15  2.51702295
## 16  2.41746666
## 17  2.34332478
## 18  2.31152433
## 19  2.29792871
## 20  2.28593634
## 21  2.25446940
## 22  2.24139023
## 23  2.23077992
## 24  2.20179481
## 25  2.20122355
## 26  2.12397515
## 27  2.11859322
## 28  2.11590640
## 29  2.10828412
## 30  2.10425385
## 31  2.00416932
## 32  1.99119244
## 33  1.97013624
## 34  1.94250141
## 35  1.93997768
## 36  1.93910212
## 37  1.92804417
## 38  1.92676735
## 39  1.91039200
## 40  1.89653350
## 41  1.89352775
## 42  1.88870826
## 43  1.85280986
## 44  1.84711558
## 45  1.84181509
## 46  1.82873528
## 47  1.81996631
## 48  1.78534603
## 49  1.75869637
## 50  1.72544261
## 51  1.71635712
## 52  1.69898528
## 53  1.69687931
## 54  1.69632149
## 55  1.63023756
## 56  1.60846815
## 57  1.59731145
## 58  1.58905291
## 59  1.57745226
## 60  1.57203125
## 61  1.55225017
## 62  1.55039935
## 63  1.50865818
## 64  1.50241377
## 65  1.48281286
## 66  1.47930125
## 67  1.47168246
## 68  1.46518546
## 69  1.45948715
## 70  1.45897625
## 71  1.45442584
## 72  1.45407740
## 73  1.45328407
## 74  1.44483896
## 75  1.44292232
## 76  1.42716083
## 77  1.41217219
## 78  1.40908352
## 79  1.40859045
## 80  1.40819006
## 81  1.38214012
## 82  1.38121012
## 83  1.37824336
## 84  1.37760826
## 85  1.37084457
## 86  1.36974365
## 87  1.34747956
## 88  1.34305584
## 89  1.34136305
## 90  1.33965246
## 91  1.33805998
## 92  1.33434058
## 93  1.33384339
## 94  1.33351835
## 95  1.32909066
## 96  1.32083515
## 97  1.30472650
## 98  1.29861097
## 99  1.28820652
## 100 1.28776003
## 101 1.28386112
## 102 1.28300726
## 103 1.28293163
## 104 1.28071810
## 105 1.27353972
## 106 1.23767420
## 107 1.23266459
## 108 1.21644949
## 109 1.21127960
## 110 1.20304181
## 111 1.20242087
## 112 1.19878534
## 113 1.19568771
## 114 1.18972607
## 115 1.16210774
## 116 1.15323473
## 117 1.15144637
## 118 1.13511651
## 119 1.13206409
## 120 1.11512283
## 121 1.10944738
## 122 1.10594906
## 123 1.09388850
## 124 1.08624655
## 125 1.08386893
## 126 1.08220828
## 127 1.06987759
## 128 1.06012236
## 129 1.03575091
## 130 1.01523652
## 131 0.95786320
## 132 0.94104576
## 133 0.94021910
## 134 0.93736289
## 135 0.92760836
## 136 0.92421297
## 137 0.91909747
## 138 0.91698060
## 139 0.91627299
## 140 0.88416158
## 141 0.88380868
## 142 0.87741894
## 143 0.87741495
## 144 0.87638123
## 145 0.84184534
## 146 0.83266341
## 147 0.82835894
## 148 0.82326511
## 149 0.78040929
## 150 0.77993897
## 151 0.77953213
## 152 0.76325238
## 153 0.75088132
## 154 0.73277434
## 155 0.73001267
## 156 0.72760278
## 157 0.72702111
## 158 0.72582586
## 159 0.72244894
## 160 0.71471180
## 161 0.71306226
## 162 0.70819990
## 163 0.70778707
## 164 0.69185295
## 165 0.68534399
## 166 0.67846502
## 167 0.66677025
## 168 0.66062848
## 169 0.65817693
## 170 0.65070486
## 171 0.63422384
## 172 0.62830057
## 173 0.62641976
## 174 0.62616021
## 175 0.62369562
## 176 0.62235936
## 177 0.61739965
## 178 0.60983380
## 179 0.60399371
## 180 0.60183344
## 181 0.60003127
## 182 0.58702488
## 183 0.55998096
## 184 0.54551365
## 185 0.54185617
## 186 0.53906219
## 187 0.53230705
## 188 0.52137529
## 189 0.51280273
## 190 0.51244874
## 191 0.51125652
## 192 0.50642823
## 193 0.49797586
## 194 0.49322074
## 195 0.49014646
## 196 0.46934884
## 197 0.46602585
## 198 0.46523199
## 199 0.45880197
## 200 0.45428680
## 201 0.44422728
## 202 0.42713223
## 203 0.41277787
## 204 0.40876440
## 205 0.40725044
## 206 0.40124370
## 207 0.38229474
## 208 0.37402445
## 209 0.36801628
## 210 0.36627131
## 211 0.36185407
## 212 0.33898771
## 213 0.29501266
## 214 0.28737997
## 215 0.25093751
## 216 0.24307357
## 217 0.23728746
## 218 0.21863158
## 219 0.21121457
## 220 0.18898753
## 221 0.18371601
## 222 0.17637921
## 223 0.17483811
## 224 0.17348756
## 225 0.15479794
## 226 0.15321118
## 227 0.15215028
## 228 0.14061332
## 229 0.12752344
## 230 0.11773079
## 231 0.10923983
## 232 0.10693030
## 233 0.08250856
## 234 0.08076162
## 235 0.07346061
## 236 0.06564513
## 237 0.03601894
## 238 0.03532381
## 239 0.02909394
## 240 0.00691040
## 241 0.00000000
## 242 0.00000000
## 243 0.00000000
## 244 0.00000000
## 245 0.00000000
## 246 0.00000000
## 247 0.00000000
## 248 0.00000000
## 249 0.00000000
## 250 0.00000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM2_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9042
## [1] "The auc value is:"
## Area under the curve: 0.9042

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}

9.2.3. Elastic Net

9.2.3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)

param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))

elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)

print(elastic_net_model1)
## glmnet 
## 
## 454 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa     
##   0      0.00100000  0.8369231  0.62861651
##   0      0.05357895  0.8545543  0.66586275
##   0      0.10615789  0.8457387  0.63995848
##   0      0.15873684  0.8434921  0.63267580
##   0      0.21131579  0.8412454  0.62101906
##   0      0.26389474  0.8412698  0.61615330
##   0      0.31647368  0.8324542  0.59121103
##   0      0.36905263  0.8192674  0.55342696
##   0      0.42163158  0.8192674  0.55342696
##   0      0.47421053  0.8060562  0.51338362
##   0      0.52678947  0.8038584  0.50491822
##   0      0.57936842  0.7994628  0.49039759
##   0      0.63194737  0.7950672  0.47776549
##   0      0.68452632  0.7906716  0.46490837
##   0      0.73710526  0.7840537  0.44525210
##   0      0.78968421  0.7818315  0.43824492
##   0      0.84226316  0.7796337  0.42781064
##   0      0.89484211  0.7642247  0.38014916
##   0      0.94742105  0.7598046  0.36607884
##   0      1.00000000  0.7554335  0.34703455
##   1      0.00100000  0.7554823  0.44828417
##   1      0.05357895  0.6564103  0.01945757
##   1      0.10615789  0.6585836  0.00000000
##   1      0.15873684  0.6585836  0.00000000
##   1      0.21131579  0.6585836  0.00000000
##   1      0.26389474  0.6585836  0.00000000
##   1      0.31647368  0.6585836  0.00000000
##   1      0.36905263  0.6585836  0.00000000
##   1      0.42163158  0.6585836  0.00000000
##   1      0.47421053  0.6585836  0.00000000
##   1      0.52678947  0.6585836  0.00000000
##   1      0.57936842  0.6585836  0.00000000
##   1      0.63194737  0.6585836  0.00000000
##   1      0.68452632  0.6585836  0.00000000
##   1      0.73710526  0.6585836  0.00000000
##   1      0.78968421  0.6585836  0.00000000
##   1      0.84226316  0.6585836  0.00000000
##   1      0.89484211  0.6585836  0.00000000
##   1      0.94742105  0.6585836  0.00000000
##   1      1.00000000  0.6585836  0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.05357895.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.7355177
FeatEval_Median_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Median_mean_accuracy_cv_ENM1)
## [1] 0.7355177
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData_ENM1$DX)


FeatEval_Median_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.997797356828194"
print(FeatEval_Median_ENM1_trainAccuracy)
## [1] 0.9977974
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Median_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Median_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 123  16
##         CN   5  50
##                                           
##                Accuracy : 0.8918          
##                  95% CI : (0.8393, 0.9317)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 7.689e-14       
##                                           
##                   Kappa : 0.7487          
##                                           
##  Mcnemar's Test P-Value : 0.0291          
##                                           
##             Sensitivity : 0.9609          
##             Specificity : 0.7576          
##          Pos Pred Value : 0.8849          
##          Neg Pred Value : 0.9091          
##              Prevalence : 0.6598          
##          Detection Rate : 0.6340          
##    Detection Prevalence : 0.7165          
##       Balanced Accuracy : 0.8593          
##                                           
##        'Positive' Class : CI              
## 
cm_FeatEval_Median_ENM1_Accuracy<-cm_FeatEval_Median_ENM1$overall["Accuracy"]
cm_FeatEval_Median_ENM1_Kappa<-cm_FeatEval_Median_ENM1$overall["Kappa"]
print(cm_FeatEval_Median_ENM1_Accuracy)
##  Accuracy 
## 0.8917526
print(cm_FeatEval_Median_ENM1_Kappa)
##     Kappa 
## 0.7487357
importance_elastic_net_model1<- varImp(elastic_net_model1)


print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## PC3         100.00
## PC2          76.08
## PC1          70.24
## cg23432430   55.07
## cg09727210   47.87
## cg07158503   45.65
## cg00962106   44.80
## cg06697310   40.39
## cg02225060   37.48
## cg16338321   37.33
## cg26081710   36.09
## cg00819121   36.07
## cg00415024   35.83
## cg09015880   35.12
## cg05064044   35.03
## cg10701746   34.38
## cg21757617   33.77
## cg00004073   32.82
## cg07504457   32.74
## cg06277607   32.40
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG ==4 ||  METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
  
  
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)

Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))


print(Ordered_importance_elastic_net_final_model1) 
  
}
##         Overall
## 1   2.237153316
## 2   1.704040405
## 3   1.573965466
## 4   1.235986222
## 5   1.075567910
## 6   1.025970043
## 7   1.007096020
## 8   0.908827880
## 9   0.843943470
## 10  0.840597048
## 11  0.812985211
## 12  0.812481924
## 13  0.807145398
## 14  0.791400263
## 15  0.789313242
## 16  0.774890174
## 17  0.761323163
## 18  0.740218154
## 19  0.738421596
## 20  0.730705360
## 21  0.729944163
## 22  0.726527624
## 23  0.724879801
## 24  0.714758473
## 25  0.706189132
## 26  0.695989206
## 27  0.691363721
## 28  0.690224818
## 29  0.688758365
## 30  0.687144760
## 31  0.687003742
## 32  0.686685451
## 33  0.673996831
## 34  0.665194458
## 35  0.661584084
## 36  0.656989250
## 37  0.651083011
## 38  0.647397986
## 39  0.643699804
## 40  0.632228957
## 41  0.631206405
## 42  0.629710163
## 43  0.623651021
## 44  0.614335337
## 45  0.611002823
## 46  0.604776522
## 47  0.603145997
## 48  0.601261672
## 49  0.598003845
## 50  0.594839526
## 51  0.589913525
## 52  0.588315692
## 53  0.587640055
## 54  0.586835887
## 55  0.585731266
## 56  0.585043673
## 57  0.584145480
## 58  0.581789994
## 59  0.581035210
## 60  0.578956245
## 61  0.567481436
## 62  0.565939326
## 63  0.561721903
## 64  0.560888478
## 65  0.557207726
## 66  0.551410875
## 67  0.548207238
## 68  0.546593627
## 69  0.544378004
## 70  0.541465977
## 71  0.540479426
## 72  0.539738213
## 73  0.538037464
## 74  0.537128971
## 75  0.535156116
## 76  0.532385153
## 77  0.525161009
## 78  0.525029670
## 79  0.524070961
## 80  0.514925239
## 81  0.508845017
## 82  0.507032774
## 83  0.505712205
## 84  0.504360045
## 85  0.501845525
## 86  0.500001082
## 87  0.493794160
## 88  0.492896575
## 89  0.492167725
## 90  0.491383092
## 91  0.488709588
## 92  0.487820601
## 93  0.487623880
## 94  0.485944046
## 95  0.484692117
## 96  0.481022524
## 97  0.478674193
## 98  0.476638858
## 99  0.476231503
## 100 0.468057321
## 101 0.467062660
## 102 0.467052683
## 103 0.465093828
## 104 0.463765669
## 105 0.457111076
## 106 0.456553318
## 107 0.455677586
## 108 0.454856421
## 109 0.454502834
## 110 0.454283502
## 111 0.450787228
## 112 0.443307620
## 113 0.441211657
## 114 0.439186488
## 115 0.436290735
## 116 0.436283372
## 117 0.435181128
## 118 0.432424025
## 119 0.429630520
## 120 0.427576252
## 121 0.422019555
## 122 0.414341777
## 123 0.410262786
## 124 0.409469237
## 125 0.409181517
## 126 0.409005727
## 127 0.408635271
## 128 0.407303282
## 129 0.403279581
## 130 0.398823947
## 131 0.398394952
## 132 0.397100864
## 133 0.392021640
## 134 0.389207444
## 135 0.389153165
## 136 0.387841581
## 137 0.381068904
## 138 0.380865637
## 139 0.376759817
## 140 0.376073665
## 141 0.376029402
## 142 0.375020152
## 143 0.374455168
## 144 0.374042625
## 145 0.370596848
## 146 0.360735096
## 147 0.359465238
## 148 0.358067574
## 149 0.347099725
## 150 0.345987364
## 151 0.344197399
## 152 0.343836461
## 153 0.335781742
## 154 0.334182931
## 155 0.333390098
## 156 0.329732553
## 157 0.329093434
## 158 0.328914541
## 159 0.324752633
## 160 0.324014688
## 161 0.323309477
## 162 0.322211875
## 163 0.321738963
## 164 0.320470675
## 165 0.319463888
## 166 0.318001860
## 167 0.317813828
## 168 0.317309856
## 169 0.315911278
## 170 0.310908449
## 171 0.310296236
## 172 0.307098995
## 173 0.306865321
## 174 0.305040266
## 175 0.303799404
## 176 0.303402188
## 177 0.302901454
## 178 0.302786919
## 179 0.301771418
## 180 0.300685211
## 181 0.300297618
## 182 0.297942871
## 183 0.295600444
## 184 0.291987869
## 185 0.291820929
## 186 0.287155893
## 187 0.286586631
## 188 0.286028321
## 189 0.285948624
## 190 0.284394197
## 191 0.283220362
## 192 0.273915443
## 193 0.272224306
## 194 0.270708050
## 195 0.264361742
## 196 0.261223254
## 197 0.258589999
## 198 0.257450469
## 199 0.256924700
## 200 0.253267758
## 201 0.253099348
## 202 0.252900800
## 203 0.252665230
## 204 0.252021923
## 205 0.251622787
## 206 0.251170397
## 207 0.250901868
## 208 0.246752605
## 209 0.242207738
## 210 0.240875969
## 211 0.238734241
## 212 0.238147174
## 213 0.236636925
## 214 0.234824424
## 215 0.233831159
## 216 0.233796076
## 217 0.231916137
## 218 0.231873255
## 219 0.230148809
## 220 0.226269264
## 221 0.224988816
## 222 0.217717950
## 223 0.207813501
## 224 0.200880311
## 225 0.199912806
## 226 0.199330762
## 227 0.199263674
## 228 0.192905184
## 229 0.189981299
## 230 0.185566344
## 231 0.180842016
## 232 0.180612879
## 233 0.177355598
## 234 0.177303096
## 235 0.172482660
## 236 0.165725965
## 237 0.161592088
## 238 0.157547264
## 239 0.154069286
## 240 0.145073143
## 241 0.139702732
## 242 0.132940855
## 243 0.107622914
## 244 0.093195960
## 245 0.083808507
## 246 0.068226558
## 247 0.038284346
## 248 0.011174598
## 249 0.011093351
## 250 0.008817437
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))


  print(importance_elastic_net_model1_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_ENM1_AUC <- auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_ENM1_AUC <- auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_ENM1_AUC <- auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## Area under the curve: 0.9244

if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_ENM1_AUC <- mean_auc
}
print(FeatEval_Median_ENM1_AUC)
## Area under the curve: 0.9244

9.2.4. XGBoost

9.2.4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 454 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa       
##   0.3  1          0.6               0.50        50      0.6057387   0.041449103
##   0.3  1          0.6               0.50       100      0.6475214   0.161924138
##   0.3  1          0.6               0.50       150      0.6585836   0.185782714
##   0.3  1          0.6               0.75        50      0.5990476  -0.002216854
##   0.3  1          0.6               0.75       100      0.6364591   0.099264033
##   0.3  1          0.6               0.75       150      0.6387302   0.123001640
##   0.3  1          0.6               1.00        50      0.5946764  -0.060252563
##   0.3  1          0.6               1.00       100      0.6100611   0.014960886
##   0.3  1          0.6               1.00       150      0.6122589   0.028201790
##   0.3  1          0.8               0.50        50      0.6057875   0.048356602
##   0.3  1          0.8               0.50       100      0.6146276   0.099079991
##   0.3  1          0.8               0.50       150      0.6519902   0.183355781
##   0.3  1          0.8               0.75        50      0.6122589   0.028983493
##   0.3  1          0.8               0.75       100      0.6540415   0.138948595
##   0.3  1          0.8               0.75       150      0.6452991   0.126957163
##   0.3  1          0.8               1.00        50      0.6035165  -0.026984198
##   0.3  1          0.8               1.00       100      0.6167033   0.022108852
##   0.3  1          0.8               1.00       150      0.6079365   0.017623023
##   0.3  2          0.6               0.50        50      0.6410501   0.142621607
##   0.3  2          0.6               0.50       100      0.6807570   0.237143541
##   0.3  2          0.6               0.50       150      0.6653236   0.184957303
##   0.3  2          0.6               0.75        50      0.6387057   0.120043040
##   0.3  2          0.6               0.75       100      0.6585836   0.169771926
##   0.3  2          0.6               0.75       150      0.6629792   0.179529185
##   0.3  2          0.6               1.00        50      0.6079121  -0.007222195
##   0.3  2          0.6               1.00       100      0.6277656   0.055892538
##   0.3  2          0.6               1.00       150      0.6299389   0.053985694
##   0.3  2          0.8               0.50        50      0.6586569   0.156766525
##   0.3  2          0.8               0.50       100      0.6740171   0.194537184
##   0.3  2          0.8               0.50       150      0.6871795   0.229799561
##   0.3  2          0.8               0.75        50      0.6365812   0.091005548
##   0.3  2          0.8               0.75       100      0.6695238   0.178366874
##   0.3  2          0.8               0.75       150      0.6740415   0.199679968
##   0.3  2          0.8               1.00        50      0.6387546   0.061912505
##   0.3  2          0.8               1.00       100      0.6498413   0.085035944
##   0.3  2          0.8               1.00       150      0.6431990   0.091790854
##   0.3  3          0.6               0.50        50      0.6652503   0.180896232
##   0.3  3          0.6               0.50       100      0.6762882   0.202648627
##   0.3  3          0.6               0.50       150      0.6938706   0.246462463
##   0.3  3          0.6               0.75        50      0.6430525   0.108653792
##   0.3  3          0.6               0.75       100      0.6518437   0.119521656
##   0.3  3          0.6               0.75       150      0.6651038   0.154071522
##   0.3  3          0.6               1.00        50      0.6189988   0.022458659
##   0.3  3          0.6               1.00       100      0.6277900   0.044089263
##   0.3  3          0.6               1.00       150      0.6321612   0.068121586
##   0.3  3          0.8               0.50        50      0.6870818   0.220702527
##   0.3  3          0.8               0.50       100      0.6849817   0.215554491
##   0.3  3          0.8               0.50       150      0.7026618   0.265348891
##   0.3  3          0.8               0.75        50      0.6695238   0.160209824
##   0.3  3          0.8               0.75       100      0.6784127   0.181713886
##   0.3  3          0.8               0.75       150      0.6762393   0.183923734
##   0.3  3          0.8               1.00        50      0.6322100   0.055333341
##   0.3  3          0.8               1.00       100      0.6542369   0.105705666
##   0.3  3          0.8               1.00       150      0.6564103   0.123084024
##   0.4  1          0.6               0.50        50      0.6233211   0.097160524
##   0.4  1          0.6               0.50       100      0.6321368   0.135374849
##   0.4  1          0.6               0.50       150      0.6606838   0.202568903
##   0.4  1          0.6               0.75        50      0.6298901   0.066392648
##   0.4  1          0.6               0.75       100      0.6365079   0.101960122
##   0.4  1          0.6               0.75       150      0.6629548   0.170898766
##   0.4  1          0.6               1.00        50      0.5902076  -0.032360824
##   0.4  1          0.6               1.00       100      0.6276679   0.065178853
##   0.4  1          0.6               1.00       150      0.6121856   0.050628791
##   0.4  1          0.8               0.50        50      0.6034188   0.042725924
##   0.4  1          0.8               0.50       100      0.6210989   0.106274256
##   0.4  1          0.8               0.50       150      0.6518926   0.175454680
##   0.4  1          0.8               0.75        50      0.6034432   0.031972486
##   0.4  1          0.8               0.75       100      0.6254945   0.105825417
##   0.4  1          0.8               0.75       150      0.6497436   0.168478092
##   0.4  1          0.8               1.00        50      0.6034676  -0.015357775
##   0.4  1          0.8               1.00       100      0.6144322   0.045881839
##   0.4  1          0.8               1.00       150      0.6299389   0.087664407
##   0.4  2          0.6               0.50        50      0.7027350   0.285469993
##   0.4  2          0.6               0.50       100      0.7005861   0.266987845
##   0.4  2          0.6               0.50       150      0.7049817   0.282783800
##   0.4  2          0.6               0.75        50      0.6210989   0.073751328
##   0.4  2          0.6               0.75       100      0.6563614   0.152656044
##   0.4  2          0.6               0.75       150      0.6630281   0.170078589
##   0.4  2          0.6               1.00        50      0.6498168   0.108568778
##   0.4  2          0.6               1.00       100      0.6409768   0.100664179
##   0.4  2          0.6               1.00       150      0.6431502   0.094878959
##   0.4  2          0.8               0.50        50      0.6630037   0.195785877
##   0.4  2          0.8               0.50       100      0.6652015   0.192449948
##   0.4  2          0.8               0.50       150      0.6586325   0.175156997
##   0.4  2          0.8               0.75        50      0.6431990   0.122314903
##   0.4  2          0.8               0.75       100      0.6585836   0.160388809
##   0.4  2          0.8               0.75       150      0.6740171   0.188408737
##   0.4  2          0.8               1.00        50      0.6276923   0.051743625
##   0.4  2          0.8               1.00       100      0.6210989   0.056338042
##   0.4  2          0.8               1.00       150      0.6386569   0.097140541
##   0.4  3          0.6               0.50        50      0.6629548   0.187742227
##   0.4  3          0.6               0.50       100      0.6717705   0.214340749
##   0.4  3          0.6               0.50       150      0.6739438   0.216292719
##   0.4  3          0.6               0.75        50      0.6673504   0.158599829
##   0.4  3          0.6               0.75       100      0.6850549   0.193442262
##   0.4  3          0.6               0.75       150      0.6784615   0.186845647
##   0.4  3          0.6               1.00        50      0.6476679   0.087171870
##   0.4  3          0.6               1.00       100      0.6454212   0.092689221
##   0.4  3          0.6               1.00       150      0.6498413   0.109984145
##   0.4  3          0.8               0.50        50      0.6697436   0.199983622
##   0.4  3          0.8               0.50       100      0.6807326   0.227575001
##   0.4  3          0.8               0.50       150      0.6652747   0.196161747
##   0.4  3          0.8               0.75        50      0.6365568   0.104708592
##   0.4  3          0.8               0.75       100      0.6322100   0.089533084
##   0.4  3          0.8               0.75       150      0.6410256   0.113156839
##   0.4  3          0.8               1.00        50      0.6145543   0.020490574
##   0.4  3          0.8               1.00       100      0.6299145   0.055059864
##   0.4  3          0.8               1.00       150      0.6365079   0.083696706
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 2, eta = 0.4, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.6460783
FeatEval_Median_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Median_mean_accuracy_cv_xgb)
## [1] 0.6460783
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Median_xgb_trainAccuracy <- train_accuracy

print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Median_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Median_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Median_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 110  46
##         CN  18  20
##                                           
##                Accuracy : 0.6701          
##                  95% CI : (0.5991, 0.7358)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 0.4131203       
##                                           
##                   Kappa : 0.181           
##                                           
##  Mcnemar's Test P-Value : 0.0007382       
##                                           
##             Sensitivity : 0.8594          
##             Specificity : 0.3030          
##          Pos Pred Value : 0.7051          
##          Neg Pred Value : 0.5263          
##              Prevalence : 0.6598          
##          Detection Rate : 0.5670          
##    Detection Prevalence : 0.8041          
##       Balanced Accuracy : 0.5812          
##                                           
##        'Positive' Class : CI              
## 
cm_FeatEval_Median_xgb_Accuracy <-cm_FeatEval_Median_xgb$overall["Accuracy"]
cm_FeatEval_Median_xgb_Kappa <-cm_FeatEval_Median_xgb$overall["Kappa"]

print(cm_FeatEval_Median_xgb_Accuracy)
##  Accuracy 
## 0.6701031
print(cm_FeatEval_Median_xgb_Kappa)
##     Kappa 
## 0.1810026
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Overall
## cg19301366  100.00
## cg16431720   89.85
## cg02932958   87.78
## cg03982462   85.06
## cg25208881   84.98
## cg01008088   82.22
## cg20803293   77.23
## cg23432430   75.15
## age.now      73.22
## cg03749159   70.38
## cg07158503   68.72
## cg00004073   67.23
## cg17042243   67.12
## cg18918831   66.24
## cg22666875   63.15
## cg01128042   63.11
## cg24139837   62.80
## cg09584650   62.37
## cg24851651   61.87
## cg02887598   61.29
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##         Feature         Gain        Cover   Frequency   Importance
##          <char>        <num>        <num>       <num>        <num>
##   1: cg19301366 1.984747e-02 0.0156385077 0.013089005 1.984747e-02
##   2: cg16431720 1.783360e-02 0.0144094438 0.005235602 1.783360e-02
##   3: cg02932958 1.742185e-02 0.0110611712 0.010471204 1.742185e-02
##   4: cg03982462 1.688218e-02 0.0118235203 0.007853403 1.688218e-02
##   5: cg25208881 1.686588e-02 0.0195031566 0.010471204 1.686588e-02
##  ---                                                              
## 190: cg15700429 1.442304e-04 0.0005172332 0.002617801 1.442304e-04
## 191: cg06403901 1.387303e-04 0.0004864023 0.002617801 1.387303e-04
## 192: cg24883219 1.136401e-04 0.0005453364 0.002617801 1.136401e-04
## 193: cg23916408 9.287332e-05 0.0005502390 0.002617801 9.287332e-05
## 194: cg18339359 7.924531e-05 0.0005191987 0.002617801 7.924531e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_xgb_AUC <-auc_value
  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.7262

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_xgb_AUC <-mean_auc
}
print(FeatEval_Median_xgb_AUC)
## Area under the curve: 0.7262

9.2.5. Random Forest

9.2.5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)

print(rf_model)
## Random Forest 
## 
## 454 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##     2   0.6585836  0.00000000
##   126   0.6630037  0.03883596
##   250   0.6652015  0.04322784
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 250.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.6622629
FeatEval_Median_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Median_mean_accuracy_cv_rf)
## [1] 0.6622629
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")

train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
FeatEval_Median_rf_trainAccuracy<-train_accuracy
print(FeatEval_Median_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Median_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Median_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 126  65
##         CN   2   1
##                                           
##                Accuracy : 0.6546          
##                  95% CI : (0.5832, 0.7213)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 0.5928          
##                                           
##                   Kappa : -6e-04          
##                                           
##  Mcnemar's Test P-Value : 3.605e-14       
##                                           
##             Sensitivity : 0.98438         
##             Specificity : 0.01515         
##          Pos Pred Value : 0.65969         
##          Neg Pred Value : 0.33333         
##              Prevalence : 0.65979         
##          Detection Rate : 0.64948         
##    Detection Prevalence : 0.98454         
##       Balanced Accuracy : 0.49976         
##                                           
##        'Positive' Class : CI              
## 
cm_FeatEval_Median_rf_Accuracy<-cm_FeatEval_Median_rf$overall["Accuracy"]
print(cm_FeatEval_Median_rf_Accuracy)
##  Accuracy 
## 0.6546392
cm_FeatEval_Median_rf_Kappa<-cm_FeatEval_Median_rf$overall["Kappa"]
print(cm_FeatEval_Median_rf_Kappa)
##         Kappa 
## -0.0006158584
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
## 
##   only 20 most important variables shown (out of 250)
## 
##            Importance
## cg23432430     100.00
## cg01008088      92.27
## cg12689021      85.29
## cg03749159      84.28
## cg03982462      81.55
## cg21697769      80.83
## cg00415024      80.34
## cg11331837      80.10
## cg25712921      75.73
## cg14532717      74.01
## cg18816397      72.29
## cg02225060      70.83
## cg09584650      70.25
## cg11133939      70.04
## cg22741595      68.68
## cg04124201      67.72
## age.now         67.03
## cg06277607      66.78
## cg14627380      66.04
## cg19503462      65.96
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5 ){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))

print(Ordered_importance_rf_final_model)
  
}
if(METHOD_FEATURE_FLAG==3 ){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))

print(Ordered_importance_rf_final_model)
  
}
##               CI           CN
## 1    3.565502830  3.565502830
## 2    3.134853250  3.134853250
## 3    2.745970277  2.745970277
## 4    2.689900677  2.689900677
## 5    2.537701437  2.537701437
## 6    2.497781007  2.497781007
## 7    2.470555726  2.470555726
## 8    2.457029673  2.457029673
## 9    2.213650261  2.213650261
## 10   2.117858787  2.117858787
## 11   2.021980158  2.021980158
## 12   1.940900761  1.940900761
## 13   1.908458625  1.908458625
## 14   1.896834030  1.896834030
## 15   1.820665430  1.820665430
## 16   1.767235051  1.767235051
## 17   1.729038283  1.729038283
## 18   1.714850883  1.714850883
## 19   1.673634257  1.673634257
## 20   1.669379953  1.669379953
## 21   1.657081268  1.657081268
## 22   1.627366098  1.627366098
## 23   1.579386737  1.579386737
## 24   1.532778914  1.532778914
## 25   1.501085628  1.501085628
## 26   1.485415098  1.485415098
## 27   1.369043229  1.369043229
## 28   1.327241845  1.327241845
## 29   1.320774873  1.320774873
## 30   1.270400952  1.270400952
## 31   1.201196999  1.201196999
## 32   1.188966284  1.188966284
## 33   1.176450859  1.176450859
## 34   1.165964185  1.165964185
## 35   1.157227016  1.157227016
## 36   1.147753406  1.147753406
## 37   1.098535364  1.098535364
## 38   1.096007497  1.096007497
## 39   1.071522386  1.071522386
## 40   1.048267646  1.048267646
## 41   1.038958046  1.038958046
## 42   1.010980043  1.010980043
## 43   0.970416727  0.970416727
## 44   0.891804279  0.891804279
## 45   0.877098864  0.877098864
## 46   0.876969403  0.876969403
## 47   0.876544570  0.876544570
## 48   0.867136389  0.867136389
## 49   0.857786621  0.857786621
## 50   0.831687813  0.831687813
## 51   0.826993278  0.826993278
## 52   0.824939717  0.824939717
## 53   0.778395058  0.778395058
## 54   0.750806999  0.750806999
## 55   0.741740759  0.741740759
## 56   0.733745005  0.733745005
## 57   0.731460979  0.731460979
## 58   0.724541713  0.724541713
## 59   0.699051112  0.699051112
## 60   0.694237512  0.694237512
## 61   0.690163868  0.690163868
## 62   0.680220669  0.680220669
## 63   0.657253952  0.657253952
## 64   0.648455126  0.648455126
## 65   0.643483860  0.643483860
## 66   0.632993341  0.632993341
## 67   0.630438255  0.630438255
## 68   0.597038053  0.597038053
## 69   0.589404743  0.589404743
## 70   0.584155928  0.584155928
## 71   0.572884477  0.572884477
## 72   0.563141399  0.563141399
## 73   0.555980718  0.555980718
## 74   0.551908135  0.551908135
## 75   0.550253864  0.550253864
## 76   0.536546406  0.536546406
## 77   0.524032359  0.524032359
## 78   0.511137748  0.511137748
## 79   0.510816441  0.510816441
## 80   0.507131009  0.507131009
## 81   0.504548755  0.504548755
## 82   0.499919055  0.499919055
## 83   0.490451802  0.490451802
## 84   0.486433745  0.486433745
## 85   0.485274027  0.485274027
## 86   0.476269728  0.476269728
## 87   0.469335209  0.469335209
## 88   0.468784583  0.468784583
## 89   0.468449671  0.468449671
## 90   0.460585518  0.460585518
## 91   0.450600333  0.450600333
## 92   0.423793465  0.423793465
## 93   0.415143164  0.415143164
## 94   0.413377477  0.413377477
## 95   0.412938238  0.412938238
## 96   0.410846061  0.410846061
## 97   0.404453887  0.404453887
## 98   0.401609387  0.401609387
## 99   0.394009601  0.394009601
## 100  0.393117234  0.393117234
## 101  0.374585984  0.374585984
## 102  0.373121943  0.373121943
## 103  0.364309072  0.364309072
## 104  0.344798048  0.344798048
## 105  0.334672137  0.334672137
## 106  0.332217941  0.332217941
## 107  0.324652907  0.324652907
## 108  0.319116320  0.319116320
## 109  0.312654559  0.312654559
## 110  0.311954820  0.311954820
## 111  0.285678630  0.285678630
## 112  0.281041336  0.281041336
## 113  0.269305208  0.269305208
## 114  0.257814624  0.257814624
## 115  0.253422556  0.253422556
## 116  0.240617075  0.240617075
## 117  0.238372852  0.238372852
## 118  0.237330058  0.237330058
## 119  0.228604751  0.228604751
## 120  0.213910585  0.213910585
## 121  0.206347831  0.206347831
## 122  0.203101740  0.203101740
## 123  0.202406756  0.202406756
## 124  0.201089595  0.201089595
## 125  0.195892957  0.195892957
## 126  0.182603353  0.182603353
## 127  0.165531773  0.165531773
## 128  0.164753781  0.164753781
## 129  0.163876139  0.163876139
## 130  0.157088512  0.157088512
## 131  0.149225794  0.149225794
## 132  0.144930720  0.144930720
## 133  0.120876624  0.120876624
## 134  0.103731010  0.103731010
## 135  0.098185429  0.098185429
## 136  0.092031187  0.092031187
## 137  0.017913420  0.017913420
## 138 -0.000727938 -0.000727938
## 139 -0.004517352 -0.004517352
## 140 -0.004537166 -0.004537166
## 141 -0.006675953 -0.006675953
## 142 -0.008218033 -0.008218033
## 143 -0.010893766 -0.010893766
## 144 -0.017017840 -0.017017840
## 145 -0.017630718 -0.017630718
## 146 -0.023863409 -0.023863409
## 147 -0.035616815 -0.035616815
## 148 -0.069598217 -0.069598217
## 149 -0.075321202 -0.075321202
## 150 -0.078622041 -0.078622041
## 151 -0.078867321 -0.078867321
## 152 -0.079430645 -0.079430645
## 153 -0.085676156 -0.085676156
## 154 -0.086524091 -0.086524091
## 155 -0.099445768 -0.099445768
## 156 -0.113918744 -0.113918744
## 157 -0.131521491 -0.131521491
## 158 -0.136800799 -0.136800799
## 159 -0.145487434 -0.145487434
## 160 -0.163941839 -0.163941839
## 161 -0.178233529 -0.178233529
## 162 -0.178421217 -0.178421217
## 163 -0.178886834 -0.178886834
## 164 -0.179760865 -0.179760865
## 165 -0.184424424 -0.184424424
## 166 -0.194964328 -0.194964328
## 167 -0.196012991 -0.196012991
## 168 -0.200061868 -0.200061868
## 169 -0.201603131 -0.201603131
## 170 -0.202553123 -0.202553123
## 171 -0.209601538 -0.209601538
## 172 -0.210393956 -0.210393956
## 173 -0.231896651 -0.231896651
## 174 -0.241122752 -0.241122752
## 175 -0.241462321 -0.241462321
## 176 -0.251521885 -0.251521885
## 177 -0.265916875 -0.265916875
## 178 -0.281172923 -0.281172923
## 179 -0.283255517 -0.283255517
## 180 -0.296365562 -0.296365562
## 181 -0.303301087 -0.303301087
## 182 -0.311789698 -0.311789698
## 183 -0.326480525 -0.326480525
## 184 -0.335690274 -0.335690274
## 185 -0.376321638 -0.376321638
## 186 -0.393309170 -0.393309170
## 187 -0.437123696 -0.437123696
## 188 -0.440050435 -0.440050435
## 189 -0.468169551 -0.468169551
## 190 -0.472949566 -0.472949566
## 191 -0.480983026 -0.480983026
## 192 -0.536197653 -0.536197653
## 193 -0.551524077 -0.551524077
## 194 -0.555599463 -0.555599463
## 195 -0.559508647 -0.559508647
## 196 -0.593923568 -0.593923568
## 197 -0.594426723 -0.594426723
## 198 -0.604395587 -0.604395587
## 199 -0.630036415 -0.630036415
## 200 -0.643846975 -0.643846975
## 201 -0.648256691 -0.648256691
## 202 -0.654437320 -0.654437320
## 203 -0.659706945 -0.659706945
## 204 -0.683015362 -0.683015362
## 205 -0.691501691 -0.691501691
## 206 -0.696497160 -0.696497160
## 207 -0.704676740 -0.704676740
## 208 -0.734421539 -0.734421539
## 209 -0.744775841 -0.744775841
## 210 -0.753947088 -0.753947088
## 211 -0.754412510 -0.754412510
## 212 -0.794471053 -0.794471053
## 213 -0.795759779 -0.795759779
## 214 -0.796814579 -0.796814579
## 215 -0.814852278 -0.814852278
## 216 -0.815820052 -0.815820052
## 217 -0.816408731 -0.816408731
## 218 -0.829524307 -0.829524307
## 219 -0.861018867 -0.861018867
## 220 -0.863363073 -0.863363073
## 221 -0.868302821 -0.868302821
## 222 -0.869580569 -0.869580569
## 223 -0.894738824 -0.894738824
## 224 -0.899513561 -0.899513561
## 225 -0.949098131 -0.949098131
## 226 -0.953311551 -0.953311551
## 227 -0.972931769 -0.972931769
## 228 -0.976729671 -0.976729671
## 229 -0.977592731 -0.977592731
## 230 -1.011107295 -1.011107295
## 231 -1.028384449 -1.028384449
## 232 -1.055701554 -1.055701554
## 233 -1.070031258 -1.070031258
## 234 -1.083249594 -1.083249594
## 235 -1.107128982 -1.107128982
## 236 -1.157202259 -1.157202259
## 237 -1.199101951 -1.199101951
## 238 -1.258769541 -1.258769541
## 239 -1.262069108 -1.262069108
## 240 -1.276258959 -1.276258959
## 241 -1.334471870 -1.334471870
## 242 -1.460723908 -1.460723908
## 243 -1.464555102 -1.464555102
## 244 -1.480346325 -1.480346325
## 245 -1.486916694 -1.486916694
## 246 -1.504021300 -1.504021300
## 247 -1.632010553 -1.632010553
## 248 -1.770856325 -1.770856325
## 249 -1.898081905 -1.898081905
## 250 -2.004853154 -2.004853154
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_rf_model_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Median_rf_AUC<-auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## Area under the curve: 0.7143

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_rf_AUC<-mean_auc
}
print(FeatEval_Median_rf_AUC)
## Area under the curve: 0.7143

9.2.6. SVM

9.2.6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 454 samples
## 250 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 364, 363, 363, 363 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.8457875  0.6704147
##   0.50  0.8479853  0.6718115
##   1.00  0.8436142  0.6564437
## 
## Tuning parameter 'sigma' was held constant at a value of 0.002046494
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.002046494 and C = 0.5.
print(svm_model$bestTune)
##         sigma   C
## 2 0.002046494 0.5
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8457957
FeatEval_Median_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Median_mean_accuracy_cv_svm)
## [1] 0.8457957
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.984581497797357"
FeatEval_Median_svm_trainAccuracy <- train_accuracy
print(FeatEval_Median_svm_trainAccuracy)
## [1] 0.9845815
predictions <- predict(svm_model, newdata = test_data_SVM1)

cm_FeatEval_Median_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Median_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 106  13
##         CN  22  53
##                                          
##                Accuracy : 0.8196         
##                  95% CI : (0.7581, 0.871)
##     No Information Rate : 0.6598         
##     P-Value [Acc > NIR] : 5.878e-07      
##                                          
##                   Kappa : 0.611          
##                                          
##  Mcnemar's Test P-Value : 0.1763         
##                                          
##             Sensitivity : 0.8281         
##             Specificity : 0.8030         
##          Pos Pred Value : 0.8908         
##          Neg Pred Value : 0.7067         
##              Prevalence : 0.6598         
##          Detection Rate : 0.5464         
##    Detection Prevalence : 0.6134         
##       Balanced Accuracy : 0.8156         
##                                          
##        'Positive' Class : CI             
## 
cm_FeatEval_Median_svm_Accuracy <- cm_FeatEval_Median_svm$overall["Accuracy"]
cm_FeatEval_Median_svm_Kappa <- cm_FeatEval_Median_svm$overall["Kappa"]
print(cm_FeatEval_Median_svm_Accuracy)
##  Accuracy 
## 0.8195876
print(cm_FeatEval_Median_svm_Kappa)
##     Kappa 
## 0.6109774

Let’s take a look of the feature importance of the model trained.

library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 648 rows and 251 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1 cg05064044     1.0285714   1.047619      1.109524        0.06790123
## 2 cg16338321     0.9809524   1.023810      1.023810        0.06635802
## 3 cg25208881     1.0000000   1.023810      1.023810        0.06635802
## 4 cg01921484     0.9857143   1.023810      1.042857        0.06635802
## 5 cg16715186     1.0000000   1.023810      1.042857        0.06635802
## 6 cg09216282     1.0000000   1.023810      1.042857        0.06635802
plot(importance_SVM)

library(vip)

vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  FeatEval_Median_svm_AUC <- auc_value

  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  FeatEval_Median_svm_AUC <- auc_value

  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")

  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc
  FeatEval_Median_svm_AUC <- auc_value

  print(auc_value) 

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(test_data_SVM1$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (test_data_SVM1$DX CN) < 128 cases (test_data_SVM1$DX CI).
## Area under the curve: 0.9071
## [1] "The auc vlue is:"
## Area under the curve: 0.9071

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Median_svm_AUC <- mean_auc
}
print(FeatEval_Median_svm_AUC )
## Area under the curve: 0.9071

9.3 Selected Based on Frequency

9.3.1 Input Feature For Evaluation

Performance of the selected output features based on Frequency

processed_dataFrame<-df_process_Output_freq
processed_data<-output_Frequency_Feature

AfterProcess_FeatureName<-df_process_frequency_FeatureName
print(head(output_Frequency_Feature))
## # A tibble: 6 × 272
##   DX         PC1 cg23432430 cg09727210        PC2 cg00962106 cg07158503 cg06697310 cg02225060 cg09015880 cg10701746 cg16338321 cg26081710 cg00415024 cg21757617 cg14168080 cg02887598 cg05064044
##   <fct>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1 CI    -0.214        0.948      0.424  0.0147         0.912      0.578      0.845      0.683      0.510      0.480      0.535      0.875      0.430     0.0365      0.419     0.0402     0.567 
## 2 CN    -0.173        0.946      0.881  0.0575         0.538      0.620      0.865      0.827      0.840      0.487      0.829      0.920      0.400     0.443       0.442     0.671      0.536 
## 3 CN    -0.00367      0.942      0.849  0.0837         0.504      0.624      0.241      0.521      0.847      0.493      0.492      0.880      0.747     0.447       0.436     0.734      0.527 
## 4 CI    -0.187        0.943      0.842 -0.0112         0.904      0.599      0.848      0.808      0.487      0.855      0.525      0.915      0.770     0.434       0.957     0.864      0.628 
## 5 CI     0.0268       0.946      0.425  0.0000165      0.896      0.631      0.821      0.608      0.889      0.488      0.842      0.917      0.742     0.747       0.946     0.836      0.566 
## 6 CN    -0.0379       0.951      0.460  0.0157         0.886      0.615      0.784      0.764      0.906      0.842      0.842      0.923      0.761     0.774       0.399     0.412      0.0830
## # ℹ 254 more variables: cg01910713 <dbl>, cg11331837 <dbl>, cg07504457 <dbl>, cg00004073 <dbl>, cg04156077 <dbl>, cg10738648 <dbl>, cg07640670 <dbl>, cg16858433 <dbl>, cg12543766 <dbl>,
## #   cg20685672 <dbl>, cg24851651 <dbl>, cg20678988 <dbl>, cg03088219 <dbl>, cg16536985 <dbl>, cg05234269 <dbl>, cg18285382 <dbl>, cg09216282 <dbl>, cg00084271 <dbl>, cg21697769 <dbl>,
## #   cg15098922 <dbl>, cg27577781 <dbl>, cg18150287 <dbl>, cg08096656 <dbl>, cg19503462 <dbl>, cg07634717 <dbl>, cg26853071 <dbl>, cg09247979 <dbl>, cg00154902 <dbl>, cg15184869 <dbl>,
## #   cg19471911 <dbl>, cg12702014 <dbl>, cg03979311 <dbl>, cg11787167 <dbl>, cg18857647 <dbl>, cg11540596 <dbl>, cg25712921 <dbl>, cg12240569 <dbl>, cg19301366 <dbl>, cg25436480 <dbl>,
## #   cg13387643 <dbl>, cg12421087 <dbl>, cg11227702 <dbl>, cg00648024 <dbl>, cg17002719 <dbl>, cg15633912 <dbl>, cg16715186 <dbl>, cg11019791 <dbl>, cg06880438 <dbl>, cg03660162 <dbl>,
## #   cg01008088 <dbl>, cg15535896 <dbl>, cg15600437 <dbl>, cg02078724 <dbl>, cg20823859 <dbl>, cg13372276 <dbl>, cg25208881 <dbl>, cg26679884 <dbl>, cg01921484 <dbl>, cg06960717 <dbl>,
## #   cg25169289 <dbl>, cg08584917 <dbl>, cg22305850 <dbl>, cg11133939 <dbl>, cg01608425 <dbl>, cg06371647 <dbl>, cg03749159 <dbl>, cg24697433 <dbl>, cg21986118 <dbl>, cg18816397 <dbl>, …
print(df_process_frequency_FeatureName)
##   [1] "PC1"        "cg23432430" "cg09727210" "PC2"        "cg00962106" "cg07158503" "cg06697310" "cg02225060" "cg09015880" "cg10701746" "cg16338321" "cg26081710" "cg00415024" "cg21757617" "cg14168080"
##  [16] "cg02887598" "cg05064044" "cg01910713" "cg11331837" "cg07504457" "cg00004073" "cg04156077" "cg10738648" "cg07640670" "cg16858433" "cg12543766" "cg20685672" "cg24851651" "cg20678988" "cg03088219"
##  [31] "cg16536985" "cg05234269" "cg18285382" "cg09216282" "cg00084271" "cg21697769" "cg15098922" "cg27577781" "cg18150287" "cg08096656" "cg19503462" "cg07634717" "cg26853071" "cg09247979" "cg00154902"
##  [46] "cg15184869" "cg19471911" "cg12702014" "cg03979311" "cg11787167" "cg18857647" "cg11540596" "cg25712921" "cg12240569" "cg19301366" "cg25436480" "cg13387643" "cg12421087" "cg11227702" "cg00648024"
##  [61] "cg17002719" "cg15633912" "cg16715186" "cg11019791" "cg06880438" "cg03660162" "cg01008088" "cg15535896" "cg15600437" "cg02078724" "cg20823859" "cg13372276" "cg25208881" "cg26679884" "cg01921484"
##  [76] "cg06960717" "cg25169289" "cg08584917" "cg22305850" "cg11133939" "cg01608425" "cg06371647" "cg03749159" "cg24697433" "cg21986118" "cg18816397" "cg01128042" "cg15700429" "cg25277809" "cg22931151"
##  [91] "cg24634455" "cg13405878" "cg02932958" "cg11286989" "cg05593887" "cg18918831" "cg11247378" "cg24139837" "cg17042243" "cg25879395" "cg18029737" "cg10681981" "cg26846609" "cg14293999" "cg10240127"
## [106] "cg08198851" "cg18993517" "cg02823329" "cg08745107" "cg13573375" "cg17738613" "cg02356645" "cg05876883" "cg24883219" "cg00696044" "cg17131279" "cg08041188" "cg24307368" "cg06961873" "cg05392160"
## [121] "cg26983017" "cg07138269" "cg04316537" "cg27224751" "cg04831745" "cg12556569" "cg17386240" "cg04412904" "cg00345083" "cg02668233" "cg10788927" "cg14687298" "cg14170504" "cg03672288" "cg14307563"
## [136] "cg09451339" "cg16431720" "cg01662749" "cg02495179" "cg04768387" "cg17002338" "cg01933473" "cg16089727" "cg24643105" "PC3"        "cg00819121" "cg09120722" "cg27272246" "cg06277607" "cg03982462"
## [151] "cg09584650" "cg08788093" "cg22666875" "cg22542451" "cg00939409" "cg17723206" "cg05321907" "cg12776173" "cg25758034" "cg14710850" "cg23517115" "cg17429539" "cg17906851" "cg00512739" "cg12689021"
## [166] "cg16571124" "cg22071943" "cg25649515" "cg04497611" "cg15730644" "cg13739190" "cg25306893" "cg16779438" "cg06483046" "cg14780448" "cg06833284" "cg14507637" "cg18819889" "cg03549208" "cg15985500"
## [181] "cg05161773" "cg06403901" "cg22169467" "cg08857872" "cg11187460" "cg03600007" "cg05850457" "cg06715136" "cg10091792" "cg03221390" "cg02122327" "cg21139150" "cg14192979" "cg23352245" "cg00146240"
## [196] "cg20981163" "cg27160885" "cg00553601" "cg12146221" "cg13226272" "cg22112152" "cg23836570" "cg08554146" "cg09785377" "cg01462799" "cg06118351" "cg17129965" "cg18339359" "cg11438323" "cg00295418"
## [211] "cg08896901" "cg18526121" "cg02550738" "cg04664583" "cg07028768" "cg01549082" "cg13815695" "cg02627240" "cg19799454" "cg06864789" "cg03737947" "cg14532717" "cg22535849" "cg04718469" "cg14627380"
## [226] "cg10039445" "cg02631626" "cg20673830" "cg17268094" "cg11706829" "cg16733676" "cg20078646" "cg13368637" "cg16652920" "cg26901661" "cg04888234" "cg04242342" "cg00322820" "cg23066280" "cg07480955"
## [241] "cg02772171" "cg21243064" "cg21388339" "cg01153376" "cg15775217" "cg02621446" "cg10666341" "cg23177161" "cg02246922" "cg25174111" "cg00322003" "cg15586958" "cg06231502" "age.now"    "cg18949721"
## [256] "cg12228670" "cg11314779" "cg23916408" "cg01280698" "cg04124201" "cg12784167" "cg04645024" "cg16202259" "cg11268585" "cg15501526" "cg03084184" "cg12333628" "cg21783012" "cg13038195" "cg04867412"
## [271] "cg20803293"
print(length(df_process_frequency_FeatureName))
## [1] 271
Num_KeyFea_Frequency <- length(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
##                     DX          PC1 cg23432430 cg09727210        PC2 cg00962106 cg07158503 cg06697310 cg02225060 cg09015880 cg10701746 cg16338321 cg26081710 cg00415024 cg21757617 cg14168080
## 200223270003_R02C01 CI -0.214185447  0.9482702  0.4240111 0.01470293  0.9124898  0.5777146  0.8454609  0.6828159  0.5101716  0.4795503  0.5350242  0.8751040  0.4299553 0.03652647  0.4190123
## 200223270003_R03C01 CN -0.172761185  0.9455418  0.8812928 0.05745834  0.5375751  0.6203543  0.8653044  0.8265195  0.8402106  0.4868342  0.8294062  0.9198212  0.3999122 0.44299089  0.4420256
## 200223270003_R06C01 CN -0.003667305  0.9418716  0.8493743 0.08372861  0.5040948  0.6236025  0.2405168  0.5209552  0.8472063  0.4927257  0.4918708  0.8801892  0.7465084 0.44725379  0.4355521
##                     cg02887598 cg05064044 cg01910713 cg11331837 cg07504457 cg00004073 cg04156077 cg10738648 cg07640670 cg16858433 cg12543766 cg20685672 cg24851651 cg20678988  cg03088219 cg16536985
## 200223270003_R02C01 0.04020908  0.5672851  0.8573169 0.03692842  0.7116230 0.02928535  0.7321883 0.44931577 0.58296513  0.9184356 0.51028134  0.6712101 0.03674702  0.8438718 0.844002862  0.5789643
## 200223270003_R03C01 0.67073881  0.5358875  0.8538850 0.57150125  0.6854539 0.02787198  0.6865805 0.49894016 0.55225610  0.9194211 0.88741539  0.7932091 0.05358297  0.8548886 0.007435243  0.5418687
## 200223270003_R06C01 0.73408417  0.5273964  0.8110366 0.03182862  0.7205633 0.64576463  0.8501188 0.05552024 0.04058533  0.9271632 0.02818501  0.6613646 0.05968923  0.7786685 0.120155222  0.8392044
##                     cg05234269 cg18285382 cg09216282 cg00084271 cg21697769 cg15098922 cg27577781 cg18150287 cg08096656 cg19503462 cg07634717 cg26853071 cg09247979 cg00154902 cg15184869 cg19471911
## 200223270003_R02C01 0.93848584  0.3202927  0.9349248  0.8103611  0.8946108  0.9286092  0.8143535  0.7685695  0.9362594  0.7951675  0.7483382  0.4233820  0.5070956  0.5137741  0.8622328  0.6334393
## 200223270003_R03C01 0.57461229  0.2930577  0.9244259  0.7877006  0.2822953  0.9027517  0.8113185  0.7519166  0.9314878  0.4537684  0.8254434  0.7451354  0.5706177  0.8540746  0.8996252  0.8437175
## 200223270003_R06C01 0.02467208  0.8923595  0.9263996  0.7706165  0.8698740  0.8525611  0.8144274  0.2501173  0.4943033  0.6997359  0.8181246  0.4228079  0.5090215  0.8188126  0.8688117  0.6127952
##                     cg12702014 cg03979311 cg11787167 cg18857647 cg11540596 cg25712921 cg12240569 cg19301366 cg25436480 cg13387643 cg12421087 cg11227702 cg00648024 cg17002719 cg15633912 cg16715186
## 200223270003_R02C01  0.7704049 0.86644909 0.03853894  0.8582332  0.9238951  0.2829848 0.82772064  0.8831393  0.8425160  0.4229959  0.5647607 0.86486075 0.51410972 0.04939181  0.1605530  0.2742789
## 200223270003_R03C01  0.7848681 0.06199853 0.04673831  0.8394132  0.8926595  0.6220919 0.02690547  0.8072679  0.4994032  0.4200273  0.5399655 0.49184121 0.40202875 0.40466475  0.9333421  0.7946153
## 200223270003_R06C01  0.8065993 0.72615553 0.32564508  0.2647491  0.8820252  0.6384003 0.46030640  0.8796022  0.3494312  0.4161488  0.5400348 0.02543724 0.05579011 0.51428089  0.8737362  0.8124316
##                     cg11019791 cg06880438 cg03660162 cg01008088 cg15535896 cg15600437 cg02078724 cg20823859 cg13372276 cg25208881 cg26679884 cg01921484 cg06960717 cg25169289 cg08584917 cg22305850
## 200223270003_R02C01  0.8112324  0.8285145  0.8691767  0.8424817  0.3382952  0.4885353  0.3096774  0.9030711 0.04888111  0.1851956  0.6793815  0.9098550  0.7030978  0.1100884  0.5663205 0.03361934
## 200223270003_R03C01  0.7831231  0.7988881  0.5160770  0.2417656  0.9253926  0.4894487  0.2896133  0.6062985 0.62396373  0.9092286  0.1848705  0.9093137  0.7653402  0.7667174  0.9019732 0.57522232
## 200223270003_R06C01  0.4353250  0.7839538  0.9026304  0.2618620  0.3320191  0.8551374  0.2805612  0.8917348 0.59693465  0.9265502  0.1701734  0.9204487  0.7206218  0.2264993  0.9187789 0.58548744
##                     cg11133939 cg01608425 cg06371647 cg03749159 cg24697433 cg21986118 cg18816397 cg01128042 cg15700429 cg25277809 cg22931151 cg24634455 cg13405878 cg02932958 cg11286989 cg05593887
## 200223270003_R02C01  0.1282694  0.9030410  0.8336894  0.9355921  0.9243095  0.6658175  0.5472925  0.9113420  0.7879010  0.1632342  0.9311023  0.7796391  0.4549662  0.7901008  0.7590008  0.5939220
## 200223270003_R03C01  0.5920898  0.9264388  0.8198684  0.9153921  0.6808390  0.6571296  0.4940355  0.5328806  0.9114530  0.4913711  0.9356702  0.5188241  0.7858042  0.4210489  0.8533989  0.5766550
## 200223270003_R06C01  0.5127706  0.8887753  0.8069537  0.9255807  0.6384606  0.7034445  0.5337018  0.5222757  0.8838233  0.5952124  0.9328614  0.5325725  0.7583938  0.3825995  0.7313884  0.9148338
##                     cg18918831 cg11247378 cg24139837 cg17042243 cg25879395 cg18029737 cg10681981 cg26846609 cg14293999 cg10240127 cg08198851 cg18993517 cg02823329 cg08745107 cg13573375 cg17738613
## 200223270003_R02C01  0.4891660  0.1591185 0.07404605  0.2502905 0.88130864  0.9100454  0.7035090 0.48860949  0.2836710  0.9250553  0.6578905  0.2091538  0.9462397 0.02921338  0.8670419  0.6879612
## 200223270003_R03C01  0.5333801  0.7874849 0.04183445  0.2933475 0.02603438  0.9016634  0.7382662 0.04878986  0.9172023  0.9403255  0.6578186  0.2665896  0.6464005 0.78542320  0.1733934  0.6582258
## 200223270003_R06C01  0.6406575  0.4807942 0.05657120  0.2725457 0.91060615  0.7376586  0.6971989 0.48026945  0.9168166  0.9056974  0.1272153  0.2574003  0.9633930 0.02709928  0.8888246  0.1022257
##                     cg02356645 cg05876883 cg24883219 cg00696044 cg17131279 cg08041188 cg24307368 cg06961873 cg05392160 cg26983017 cg07138269 cg04316537 cg27224751 cg04831745 cg12556569 cg17386240
## 200223270003_R02C01  0.5105903  0.9039064  0.6430473 0.55608424  0.1900637  0.7752456 0.64323677  0.5335591  0.9328933 0.89868232  0.5002290  0.8074830 0.44503947 0.61984995 0.06218231  0.7473400
## 200223270003_R03C01  0.5833923  0.9223308  0.6822115 0.07552381  0.7048637  0.3201255 0.34980461  0.5472606  0.2576881 0.03145466  0.9426707  0.8453340 0.03214912 0.71214149 0.03924599  0.7144809
## 200223270003_R06C01  0.5701428  0.4697980  0.5296903 0.79270858  0.1492861  0.7900939 0.02720398  0.9415177  0.8920726 0.84677625  0.5057781  0.4351695 0.83123722 0.06871768 0.48636893  0.8074824
##                     cg04412904 cg00345083 cg02668233 cg10788927 cg14687298 cg14170504 cg03672288 cg14307563 cg09451339 cg16431720 cg01662749 cg02495179 cg04768387 cg17002338 cg01933473 cg16089727
## 200223270003_R02C01 0.05088595 0.47960968  0.4708431  0.8973154 0.04206702 0.54915621  0.9235592  0.1855966  0.2243746  0.7356099  0.3506201  0.6813307  0.3131047  0.9286251  0.2589014 0.86748697
## 200223270003_R03C01 0.07717659 0.50833875  0.8841930  0.2021398 0.14813581 0.02236650  0.6718625  0.8916957  0.2340702  0.8692449  0.2510946  0.7373055  0.9465814  0.2684163  0.6726133 0.54996692
## 200223270003_R06C01 0.08253743 0.03929249  0.4575646  0.2053075 0.24260002 0.02988245  0.9007629  0.8750052  0.8921284  0.8773137  0.8061480  0.5588114  0.9098563  0.2811103  0.2642560 0.05876736
##                     cg24643105          PC3 cg00819121 cg09120722 cg27272246 cg06277607 cg03982462 cg09584650 cg08788093 cg22666875 cg22542451 cg00939409 cg17723206 cg05321907 cg12776173 cg25758034
## 200223270003_R02C01  0.5303418 -0.014043316  0.9207001  0.5878977  0.8615873 0.10744587  0.8562777 0.08230254 0.03911678  0.8177182  0.5884356  0.2652180 0.92881042  0.2880477  0.1038804  0.6114028
## 200223270003_R03C01  0.5042688  0.005055871  0.9281472  0.8287506  0.8705287 0.09353494  0.6023731 0.09661586 0.60934160  0.8291957  0.8337068  0.8882671 0.48556255  0.1782629  0.8730635  0.6649219
## 200223270003_R06C01  0.9383050  0.029143653  0.9327211  0.8793344  0.8103777 0.09504696  0.8778458 0.52399749 0.88380243  0.3694180  0.8125084  0.8842646 0.01765023  0.8427929  0.7009491  0.2393844
##                     cg14710850 cg23517115 cg17429539 cg17906851 cg00512739 cg12689021 cg16571124 cg22071943 cg25649515 cg04497611 cg15730644 cg13739190 cg25306893 cg16779438 cg06483046 cg14780448
## 200223270003_R02C01  0.8048592  0.2151144  0.7860900  0.9488392  0.9337648  0.7706828  0.9282854  0.8705217  0.9279829  0.9086359  0.4803181  0.8510103  0.6265392  0.8826150 0.04383925  0.9119141
## 200223270003_R03C01  0.8090950  0.9131440  0.7100923  0.9529718  0.8863895  0.7449475  0.9206431  0.2442648  0.9235753  0.8818513  0.4353906  0.8358482  0.8330282  0.5466924 0.50720277  0.6702102
## 200223270003_R06C01  0.8285902  0.8328364  0.7660838  0.6462151  0.9242748  0.7872237  0.9276842  0.2644581  0.5895839  0.5853116  0.8763048  0.8419471  0.6175380  0.8629492 0.89604910  0.6207355
##                     cg06833284 cg14507637 cg18819889 cg03549208 cg15985500 cg05161773 cg06403901 cg22169467 cg08857872 cg11187460 cg03600007 cg05850457 cg06715136 cg10091792 cg03221390 cg02122327
## 200223270003_R02C01  0.9125144  0.9051258  0.9156157  0.9014487  0.8555262  0.4120912 0.92790690  0.3095010  0.3395280 0.03672179  0.5658487  0.8183013  0.3400192  0.8670733  0.5859063 0.38940091
## 200223270003_R03C01  0.9003482  0.9009460  0.9004455  0.8381784  0.8312198  0.4154907 0.04783341  0.2978585  0.8181845 0.92516409  0.6018832  0.8313023  0.9259109  0.5864221  0.9180706 0.37769608
## 200223270003_R06C01  0.6097933  0.9013686  0.9054439  0.9097817  0.8492103  0.8526849 0.05253626  0.8955853  0.2970779 0.03109553  0.8611166  0.8161364  0.9079807  0.6087997  0.6399867 0.04017909
##                     cg21139150 cg14192979 cg23352245 cg00146240 cg20981163 cg27160885 cg00553601 cg12146221 cg13226272 cg22112152 cg23836570 cg08554146 cg09785377 cg01462799 cg06118351 cg17129965
## 200223270003_R02C01 0.01853264 0.06336040  0.9377232  0.6336151  0.8990628  0.2231606 0.05601299  0.2049284 0.02637249  0.8476101 0.58688450  0.8982080  0.9162088  0.8284427  0.3633940  0.8972140
## 200223270003_R03C01 0.43223243 0.06019651  0.9375774  0.8957183  0.9264076  0.8263885 0.58957701  0.1814927 0.54100016  0.8014136 0.54259383  0.8963074  0.9226292  0.4038824  0.4714860  0.8806673
## 200223270003_R06C01 0.43772680 0.52114282  0.5932742  0.1433218  0.4874651  0.2121179 0.62426500  0.8619250 0.44370701  0.7897897 0.03267304  0.8213878  0.6405193  0.4676821  0.8655962  0.8857237
##                     cg18339359 cg11438323 cg00295418 cg08896901 cg18526121 cg02550738 cg04664583 cg07028768 cg01549082 cg13815695 cg02627240 cg19799454 cg06864789 cg03737947 cg14532717 cg22535849
## 200223270003_R02C01  0.8824858  0.4863471 0.44954665  0.3581911  0.4519781  0.6201457  0.5572814  0.4496851  0.2924138  0.9267057 0.66706843  0.9178930 0.05369415 0.91824910  0.5732280  0.8847704
## 200223270003_R03C01  0.9040272  0.8984559 0.48471295  0.2467071  0.4762313  0.9011727  0.5881190  0.8536078  0.7065693  0.6859729 0.57129408  0.9106247 0.46053125 0.92067153  0.1107638  0.8609966
## 200223270003_R06C01  0.8552121  0.8722772 0.02004532  0.9225209  0.4833367  0.9085849  0.9352717  0.8356936  0.2895440  0.6509046 0.05309659  0.9066551 0.87513655 0.03638091  0.6273416  0.8808022
##                     cg04718469 cg14627380 cg10039445 cg02631626 cg20673830 cg17268094 cg11706829 cg16733676 cg20078646 cg13368637 cg16652920 cg26901661 cg04888234 cg04242342 cg00322820 cg23066280
## 200223270003_R02C01  0.8687522  0.9455369  0.8833873  0.6280766  0.2422052  0.5774753  0.8897234  0.9057228 0.06198170  0.5597507  0.9436000  0.8951971  0.8379655  0.8206769  0.4869764 0.07247841
## 200223270003_R03C01  0.7256813  0.9258964  0.8954055  0.1951736  0.6881735  0.9003262  0.5444785  0.8904541 0.89537412  0.9100088  0.9431222  0.8754981  0.4376314  0.8167892  0.4858988 0.57174588
## 200223270003_R06C01  0.8521881  0.5789898  0.8832807  0.2699849  0.2134634  0.8789368  0.5669449  0.1698111 0.08725521  0.8739205  0.9457161  0.9021064  0.8039047  0.8040357  0.4754313 0.80814756
##                     cg07480955 cg02772171 cg21243064 cg21388339 cg01153376 cg15775217 cg02621446 cg10666341 cg23177161 cg02246922 cg25174111 cg00322003 cg15586958 cg06231502 age.now cg18949721
## 200223270003_R02C01  0.3874638  0.9182018  0.5191606  0.2756268  0.4872148  0.5707441  0.8731313  0.9046648  0.4151698  0.7301201  0.8526503  0.1759911  0.9058263  0.7784451    82.4  0.2334245
## 200223270003_R03C01  0.3916889  0.5660559  0.9167649  0.2102269  0.9639670  0.9168327  0.8095534  0.6731062  0.4586576  0.9447019  0.8573844  0.5702070  0.8957526  0.7964278    78.6  0.2437792
## 200223270003_R06C01  0.4043390  0.8995479  0.4862205  0.7649181  0.2242410  0.6042521  0.7511582  0.6443180  0.8287312  0.7202230  0.2567745  0.3077122  0.9121763  0.7706160    80.4  0.2523095
##                     cg12228670 cg11314779 cg23916408 cg01280698 cg04124201 cg12784167 cg04645024 cg16202259 cg11268585 cg15501526 cg03084184 cg12333628 cg21783012 cg13038195 cg04867412 cg20803293
## 200223270003_R02C01  0.8632174  0.0242134  0.1942275  0.8985067  0.8686421 0.81503498  0.7366541  0.9548726  0.2521544  0.6362531  0.8162981  0.9227884  0.9142369 0.45882213 0.04304823 0.54933918
## 200223270003_R03C01  0.8496212  0.8966100  0.9154993  0.8846201  0.3308589 0.02811410  0.8454827  0.3713483  0.8535791  0.6319253  0.7877128  0.9092861  0.6694884 0.02740132 0.87967997 0.07935747
## 200223270003_R06C01  0.8738949  0.8908661  0.8886255  0.8847132  0.3241613 0.03073269  0.0871902  0.4852461  0.9121931  0.7435100  0.4546397  0.5084647  0.9070112 0.46284376 0.44971146 0.42191244
##  [ reached 'max' / getOption("max.print") -- omitted 3 rows ]

9.3.2. Logistic Regression Model

9.3.2.1 Logistic Regression Model Training

df_LRM1<-processed_data 
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 454 272
dim(testData)
## [1] 194 272
ctrl <- trainControl(method = "cv", number = 5)

model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Freq_LRM1<-caret::confusionMatrix(predictions, testData$DX)

print(cm_FeatEval_Freq_LRM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 118  17
##         CN  10  49
##                                          
##                Accuracy : 0.8608         
##                  95% CI : (0.804, 0.9062)
##     No Information Rate : 0.6598         
##     P-Value [Acc > NIR] : 1.846e-10      
##                                          
##                   Kappa : 0.6818         
##                                          
##  Mcnemar's Test P-Value : 0.2482         
##                                          
##             Sensitivity : 0.9219         
##             Specificity : 0.7424         
##          Pos Pred Value : 0.8741         
##          Neg Pred Value : 0.8305         
##              Prevalence : 0.6598         
##          Detection Rate : 0.6082         
##    Detection Prevalence : 0.6959         
##       Balanced Accuracy : 0.8321         
##                                          
##        'Positive' Class : CI             
## 
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Freq_LRM1_Accuracy <- cm_FeatEval_Freq_LRM1$overall["Accuracy"]
cm_FeatEval_Freq_LRM1_Kappa <- cm_FeatEval_Freq_LRM1$overall["Kappa"]

print(cm_FeatEval_Freq_LRM1_Accuracy)
##  Accuracy 
## 0.8608247
print(cm_FeatEval_Freq_LRM1_Kappa)
##     Kappa 
## 0.6818127
print(model_LRM1)
## glmnet 
## 
## 454 samples
## 271 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0001769938  0.7950916  0.5407212
##   0.10   0.0017699384  0.7906960  0.5318216
##   0.10   0.0176993845  0.7884493  0.5171950
##   0.55   0.0001769938  0.7599023  0.4610387
##   0.55   0.0017699384  0.7465690  0.4220933
##   0.55   0.0176993845  0.7025885  0.3120763
##   1.00   0.0001769938  0.7290354  0.3971826
##   1.00   0.0017699384  0.7267643  0.3850125
##   1.00   0.0176993845  0.6761172  0.2344042
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0001769938.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData$DX)

FeatEval_Freq_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Freq_LRM1_trainAccuracy)
## [1] 1
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.7461349
FeatEval_Freq_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Freq_mean_accuracy_cv_LRM1)
## [1] 0.7461349
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_LRM1_AUC <- auc_value

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_LRM1_AUC <- auc_value

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")

  roc_curve <- roc(testData$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_LRM1_AUC <- auc_value

  print(roc_curve)

  print("The auc value is:")
  print(auc_value)

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9074
## [1] "The auc value is:"
## Area under the curve: 0.9074

if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_LRM1_AUC <- mean_auc
}
importance_model_LRM1 <- varImp(model_LRM1)

print(importance_model_LRM1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 271)
## 
##            Overall
## PC3         100.00
## PC1          51.90
## cg09727210   37.16
## PC2          36.46
## cg23432430   32.91
## cg06697310   30.77
## cg07158503   30.61
## cg10701746   30.49
## cg09015880   28.97
## cg00962106   28.44
## cg00415024   26.14
## cg16858433   25.31
## cg14168080   25.13
## cg01910713   25.04
## cg02225060   24.45
## cg16338321   24.38
## cg26081710   23.76
## cg00819121   23.65
## cg05064044   23.20
## cg04156077   22.86
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")

importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 ||METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
  
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)

library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM1)  
  
}
##         Overall
## 1   15.92734220
## 2    8.26553545
## 3    5.91813200
## 4    5.80674878
## 5    5.24165459
## 6    4.90007485
## 7    4.87592489
## 8    4.85551455
## 9    4.61478136
## 10   4.53022365
## 11   4.16272340
## 12   4.03129520
## 13   4.00281089
## 14   3.98774531
## 15   3.89346533
## 16   3.88254220
## 17   3.78423706
## 18   3.76634572
## 19   3.69539997
## 20   3.64111247
## 21   3.62906951
## 22   3.50557204
## 23   3.47185706
## 24   3.41089554
## 25   3.37023298
## 26   3.33784876
## 27   3.24295118
## 28   3.18783280
## 29   3.17118905
## 30   3.15948395
## 31   3.15207396
## 32   3.14412859
## 33   3.09415071
## 34   3.06957299
## 35   2.94472526
## 36   2.89071688
## 37   2.88287554
## 38   2.86128463
## 39   2.85118996
## 40   2.84574903
## 41   2.81554859
## 42   2.80927697
## 43   2.80517185
## 44   2.75032214
## 45   2.74544241
## 46   2.72229785
## 47   2.71512892
## 48   2.70563004
## 49   2.68881594
## 50   2.67497092
## 51   2.65305149
## 52   2.65034016
## 53   2.64675913
## 54   2.64283671
## 55   2.63956272
## 56   2.63382078
## 57   2.55010382
## 58   2.54179777
## 59   2.54152160
## 60   2.53843703
## 61   2.51642841
## 62   2.46763922
## 63   2.45223586
## 64   2.44231686
## 65   2.43971963
## 66   2.43296810
## 67   2.41903787
## 68   2.34682815
## 69   2.33677971
## 70   2.31543299
## 71   2.28517697
## 72   2.28085467
## 73   2.26032445
## 74   2.25632869
## 75   2.24244679
## 76   2.22270304
## 77   2.20897369
## 78   2.20668909
## 79   2.19300211
## 80   2.15743202
## 81   2.15571784
## 82   2.13557122
## 83   2.12759925
## 84   2.12413107
## 85   2.10878172
## 86   2.09867379
## 87   2.08523013
## 88   2.07564935
## 89   2.06332618
## 90   2.05504806
## 91   2.04866093
## 92   2.04764123
## 93   2.04510807
## 94   2.03968425
## 95   2.03310649
## 96   2.02310863
## 97   2.02090364
## 98   2.01771715
## 99   2.01126801
## 100  2.00858787
## 101  1.97862630
## 102  1.95031494
## 103  1.93946310
## 104  1.92216973
## 105  1.85780256
## 106  1.85331233
## 107  1.84938178
## 108  1.84788084
## 109  1.83724884
## 110  1.83363744
## 111  1.82774671
## 112  1.80738549
## 113  1.79558711
## 114  1.78021849
## 115  1.77421627
## 116  1.73387532
## 117  1.70530495
## 118  1.69074980
## 119  1.68177368
## 120  1.68049238
## 121  1.67459279
## 122  1.65826010
## 123  1.63648018
## 124  1.60782698
## 125  1.60611836
## 126  1.60053194
## 127  1.59955019
## 128  1.56860331
## 129  1.56552464
## 130  1.55338208
## 131  1.54756177
## 132  1.53882246
## 133  1.52324462
## 134  1.51394830
## 135  1.50855095
## 136  1.48594821
## 137  1.47217799
## 138  1.44952536
## 139  1.44184955
## 140  1.43378675
## 141  1.42108908
## 142  1.41992839
## 143  1.41932249
## 144  1.39363199
## 145  1.38538965
## 146  1.37306250
## 147  1.37055390
## 148  1.36808893
## 149  1.36142615
## 150  1.35462389
## 151  1.35345413
## 152  1.34872905
## 153  1.34292994
## 154  1.31499444
## 155  1.31138144
## 156  1.28674891
## 157  1.27758196
## 158  1.27044973
## 159  1.26575020
## 160  1.26292765
## 161  1.26246812
## 162  1.25048145
## 163  1.24048572
## 164  1.23670010
## 165  1.22925034
## 166  1.21403958
## 167  1.21131559
## 168  1.20066084
## 169  1.17808268
## 170  1.16629580
## 171  1.16390465
## 172  1.08804726
## 173  1.07394317
## 174  1.04406658
## 175  1.03191534
## 176  1.03182960
## 177  1.02102657
## 178  1.01607669
## 179  0.99371527
## 180  0.98240805
## 181  0.97660766
## 182  0.97033624
## 183  0.95422433
## 184  0.94655603
## 185  0.91845543
## 186  0.90735773
## 187  0.89674758
## 188  0.89625854
## 189  0.87775208
## 190  0.86602268
## 191  0.86132515
## 192  0.84916831
## 193  0.84315319
## 194  0.84220267
## 195  0.84003417
## 196  0.82543963
## 197  0.82275814
## 198  0.78033920
## 199  0.75428505
## 200  0.75097759
## 201  0.74134410
## 202  0.73624509
## 203  0.73510359
## 204  0.72034140
## 205  0.71810061
## 206  0.71654019
## 207  0.71596246
## 208  0.71224241
## 209  0.71074903
## 210  0.70262216
## 211  0.69122376
## 212  0.65799677
## 213  0.63378638
## 214  0.63378289
## 215  0.62351543
## 216  0.61428282
## 217  0.59611272
## 218  0.59606282
## 219  0.59277437
## 220  0.58208494
## 221  0.57437958
## 222  0.57070692
## 223  0.52074019
## 224  0.50658547
## 225  0.50548040
## 226  0.49464179
## 227  0.47263809
## 228  0.45940064
## 229  0.45940029
## 230  0.45569540
## 231  0.42800884
## 232  0.36835256
## 233  0.36272264
## 234  0.35424810
## 235  0.34831125
## 236  0.31323372
## 237  0.30473683
## 238  0.29793263
## 239  0.29642103
## 240  0.29363882
## 241  0.28951455
## 242  0.27106675
## 243  0.26684098
## 244  0.26372796
## 245  0.23993772
## 246  0.22894579
## 247  0.22884951
## 248  0.19553089
## 249  0.17487794
## 250  0.15908749
## 251  0.12810849
## 252  0.12441775
## 253  0.11685277
## 254  0.09415279
## 255  0.07780898
## 256  0.07364417
## 257  0.06672353
## 258  0.06619426
## 259  0.05178736
## 260  0.04787197
## 261  0.03467063
## 262  0.02487359
## 263  0.02331246
## 264  0.01847575
## 265  0.00000000
## 266  0.00000000
## 267  0.00000000
## 268  0.00000000
## 269  0.00000000
## 270  0.00000000
## 271  0.00000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
  importance_model_LRM1_df <- importance_model_LRM1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM1_df)
  
}
if (!require(reshape2)) {
  install.packages("reshape2")
  library(reshape2)
} else {
  library(reshape2)
}

if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM1_df,n=20)$Feature)
  importance_melted_LRM1_df <- importance_model_LRM1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}

9.3.2.2 Model Diagnose & Improve

9.3.2.2.1 Class imbalance
Class imbalance Check
  • Let’s plot the distribution of “DX” using a bar plot.
table(df_LRM1$DX)
## 
##  CI  CN 
## 427 221
prop.table(table(df_LRM1$DX))
## 
##        CI        CN 
## 0.6589506 0.3410494
table(trainData$DX)
## 
##  CI  CN 
## 299 155
prop.table(table(trainData$DX))
## 
##        CI        CN 
## 0.6585903 0.3414097
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")

For the training Data set:

barplot(table(trainData$DX), main = "Train Data Class Distribution")

  • Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.

    class_counts <- table(df_LRM1$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the whole data set is:")
    ## [1] "The imbalance radio of the whole data set is:"
    print(imbalance_ratio)
    ## [1] 1.932127
    class_counts <- table(trainData$DX)
    imbalance_ratio <- max(class_counts) / min(class_counts)
    print("The imbalance radio of the training data set is:")
    ## [1] "The imbalance radio of the training data set is:"
    print(imbalance_ratio)
    ## [1] 1.929032
  • Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.

    chisq.test(table(df_LRM1$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(df_LRM1$DX)
    ## X-squared = 65.488, df = 1, p-value = 5.848e-16
    chisq.test(table(trainData$DX))
    ## 
    ##  Chi-squared test for given probabilities
    ## 
    ## data:  table(trainData$DX)
    ## X-squared = 45.674, df = 1, p-value = 1.397e-11
Solve Class imbalance use “SMOTE” (NOT OK YET, MAY NEED FURTHER IMPROVE)
library(smotefamily)

smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)

balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
## 
##  CI  CN 
## 299 310
dim(balanced_data_LGR_1)
## [1] 609 272
Fit Model with Balanced Data
ctrl <- trainControl(method = "cv", number = 5)

model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)

predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 119  16
##         CN   9  50
##                                           
##                Accuracy : 0.8711          
##                  95% CI : (0.8157, 0.9148)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 1.656e-11       
##                                           
##                   Kappa : 0.7054          
##                                           
##  Mcnemar's Test P-Value : 0.2301          
##                                           
##             Sensitivity : 0.9297          
##             Specificity : 0.7576          
##          Pos Pred Value : 0.8815          
##          Neg Pred Value : 0.8475          
##              Prevalence : 0.6598          
##          Detection Rate : 0.6134          
##    Detection Prevalence : 0.6959          
##       Balanced Accuracy : 0.8436          
##                                           
##        'Positive' Class : CI              
## 
print(model_LRM2)
## glmnet 
## 
## 609 samples
## 271 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 488, 487, 487, 487, 487 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        Accuracy   Kappa    
##   0.10   0.0001799134  0.8767918  0.7530306
##   0.10   0.0017991337  0.8751660  0.7497261
##   0.10   0.0179913372  0.8735266  0.7464651
##   0.55   0.0001799134  0.8571196  0.7134450
##   0.55   0.0017991337  0.8571061  0.7134548
##   0.55   0.0179913372  0.8242650  0.6475373
##   1.00   0.0001799134  0.8472565  0.6936802
##   1.00   0.0017991337  0.8456171  0.6903176
##   1.00   0.0179913372  0.7980219  0.5948105
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0001799134.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")


train_accuracy <- mean(train_predictions == trainData$DX)


print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8505412
importance_model_LRM2 <- varImp(model_LRM2)


print(importance_model_LRM2)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 271)
## 
##            Overall
## PC3         100.00
## PC1          49.89
## PC2          36.15
## cg09727210   35.79
## cg23432430   30.16
## cg06697310   29.33
## cg07158503   28.54
## cg10701746   27.66
## cg00962106   27.25
## cg09015880   27.09
## cg16858433   25.65
## cg00415024   24.88
## cg01910713   24.71
## cg16338321   24.05
## cg14168080   23.89
## cg00819121   23.50
## cg02225060   23.18
## cg26081710   22.12
## cg05064044   22.03
## cg04156077   22.02
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")

importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5|| METHOD_FEATURE_FLAG==6){

importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)

library(dplyr)

ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))

print(ordered_importance_final_model_LRM2)  
  
}
##          Overall
## 1   16.818284630
## 2    8.390124327
## 3    6.079895098
## 4    6.019900448
## 5    5.072437562
## 6    4.933163688
## 7    4.800568592
## 8    4.652198970
## 9    4.583160745
## 10   4.556894270
## 11   4.313569222
## 12   4.184896804
## 13   4.156555526
## 14   4.045221482
## 15   4.017299156
## 16   3.952266201
## 17   3.898326470
## 18   3.719819195
## 19   3.705545286
## 20   3.702919494
## 21   3.697922855
## 22   3.547474602
## 23   3.532132827
## 24   3.459807600
## 25   3.458232690
## 26   3.283237195
## 27   3.276363051
## 28   3.262527678
## 29   3.167618871
## 30   3.161433452
## 31   3.146722148
## 32   3.093366099
## 33   3.084722440
## 34   3.074611214
## 35   3.045836125
## 36   3.010128723
## 37   2.970464786
## 38   2.910184980
## 39   2.878553776
## 40   2.878328101
## 41   2.870179191
## 42   2.829816472
## 43   2.829179141
## 44   2.822823097
## 45   2.806355508
## 46   2.779699032
## 47   2.757678143
## 48   2.742122630
## 49   2.736684216
## 50   2.714657415
## 51   2.706960014
## 52   2.633456581
## 53   2.629645682
## 54   2.617818890
## 55   2.615843512
## 56   2.576988205
## 57   2.562507470
## 58   2.521614613
## 59   2.518489090
## 60   2.502329759
## 61   2.488413141
## 62   2.463563596
## 63   2.463152369
## 64   2.453800318
## 65   2.437603457
## 66   2.427899947
## 67   2.406967403
## 68   2.401943320
## 69   2.381439895
## 70   2.352112960
## 71   2.347925166
## 72   2.323097935
## 73   2.314971690
## 74   2.302855905
## 75   2.272636364
## 76   2.244150273
## 77   2.242578563
## 78   2.220261857
## 79   2.208174967
## 80   2.207626122
## 81   2.200042058
## 82   2.192250060
## 83   2.187195643
## 84   2.185219258
## 85   2.157745295
## 86   2.143668997
## 87   2.136667541
## 88   2.133521482
## 89   2.115308420
## 90   2.094530313
## 91   2.087136467
## 92   2.085284939
## 93   2.052704827
## 94   2.040811695
## 95   2.033893599
## 96   2.031418304
## 97   2.023988338
## 98   2.003403068
## 99   2.000654041
## 100  1.996017000
## 101  1.974675425
## 102  1.962637816
## 103  1.944596934
## 104  1.939796868
## 105  1.920413644
## 106  1.902180602
## 107  1.892551477
## 108  1.892005066
## 109  1.889915061
## 110  1.855152784
## 111  1.848215389
## 112  1.834025609
## 113  1.785792676
## 114  1.778301071
## 115  1.773927258
## 116  1.756937720
## 117  1.753796218
## 118  1.749222895
## 119  1.731421647
## 120  1.721380455
## 121  1.719240261
## 122  1.715930521
## 123  1.698384903
## 124  1.682069528
## 125  1.667590663
## 126  1.643062600
## 127  1.641578766
## 128  1.635449866
## 129  1.627147053
## 130  1.622680613
## 131  1.577777185
## 132  1.576540145
## 133  1.576296223
## 134  1.571810508
## 135  1.571447475
## 136  1.524509457
## 137  1.507767488
## 138  1.507286195
## 139  1.485251213
## 140  1.474368654
## 141  1.457008395
## 142  1.447998741
## 143  1.438450761
## 144  1.434259615
## 145  1.432045302
## 146  1.429609257
## 147  1.419881991
## 148  1.418803105
## 149  1.417168704
## 150  1.401069991
## 151  1.343664591
## 152  1.340264739
## 153  1.337448187
## 154  1.330927069
## 155  1.318894292
## 156  1.289084210
## 157  1.275707957
## 158  1.254327516
## 159  1.253951736
## 160  1.241595266
## 161  1.235414852
## 162  1.213498655
## 163  1.212020845
## 164  1.210458975
## 165  1.196246267
## 166  1.186332633
## 167  1.161181928
## 168  1.143244894
## 169  1.139968390
## 170  1.125522954
## 171  1.115315401
## 172  1.084998451
## 173  1.063963864
## 174  1.034917888
## 175  1.032132637
## 176  1.015597103
## 177  1.005738210
## 178  0.999978446
## 179  0.989772919
## 180  0.989754432
## 181  0.975015067
## 182  0.974193410
## 183  0.945026434
## 184  0.941493764
## 185  0.932228635
## 186  0.930227967
## 187  0.919925395
## 188  0.916365835
## 189  0.908534570
## 190  0.900171040
## 191  0.887591914
## 192  0.876311254
## 193  0.858279311
## 194  0.848962742
## 195  0.813378972
## 196  0.806029950
## 197  0.804228287
## 198  0.783145170
## 199  0.781992194
## 200  0.776514930
## 201  0.771393503
## 202  0.745743264
## 203  0.725800447
## 204  0.720243639
## 205  0.711138549
## 206  0.687510870
## 207  0.686998704
## 208  0.679301485
## 209  0.678444574
## 210  0.657421255
## 211  0.651592532
## 212  0.646577574
## 213  0.646049748
## 214  0.635258323
## 215  0.631440204
## 216  0.622266486
## 217  0.596522235
## 218  0.589152431
## 219  0.570879288
## 220  0.558786606
## 221  0.535866250
## 222  0.525586474
## 223  0.512179472
## 224  0.505183815
## 225  0.491087596
## 226  0.484311922
## 227  0.465876744
## 228  0.465295092
## 229  0.450072540
## 230  0.420597788
## 231  0.406153018
## 232  0.392776978
## 233  0.377449167
## 234  0.352726489
## 235  0.324540249
## 236  0.319449978
## 237  0.317436989
## 238  0.310735275
## 239  0.304530249
## 240  0.291345427
## 241  0.288739787
## 242  0.269736981
## 243  0.266945228
## 244  0.266287802
## 245  0.237734142
## 246  0.207579880
## 247  0.198084446
## 248  0.175476012
## 249  0.169487562
## 250  0.159368173
## 251  0.126781963
## 252  0.116142122
## 253  0.089216160
## 254  0.084046321
## 255  0.078869851
## 256  0.071151577
## 257  0.043414415
## 258  0.032761215
## 259  0.011854184
## 260  0.004996699
## 261  0.000000000
## 262  0.000000000
## 263  0.000000000
## 264  0.000000000
## 265  0.000000000
## 266  0.000000000
## 267  0.000000000
## 268  0.000000000
## 269  0.000000000
## 270  0.000000000
## 271  0.000000000
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
  importance_model_LRM2_df <- importance_model_LRM2_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_model_LRM2_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_model_LRM2_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_model_LRM2_df,n=20)$Feature)
  
  importance_melted_LRM2_df <- importance_model_LRM2_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_LRM2_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
  auc_value <- roc_curve$auc

  print(roc_curve)
  print("The auc value is:")
  print(auc_value)
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(testData$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9048
## [1] "The auc value is:"
## Area under the curve: 0.9048

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
}

9.3.3. Elastic Net

9.3.3.1 Elastic Net Model Training

df_ENM1<-processed_data 
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)

param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))

elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
                           trControl = ctrl, tuneGrid = param_grid)

print(elastic_net_model1)
## glmnet 
## 
## 454 samples
## 271 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      Accuracy   Kappa     
##   0      0.00100000  0.8060806  0.56042286
##   0      0.05357895  0.8214896  0.58612440
##   0      0.10615789  0.8193162  0.57402163
##   0      0.15873684  0.8170940  0.56534536
##   0      0.21131579  0.8192918  0.56553276
##   0      0.26389474  0.8148718  0.54883821
##   0      0.31647368  0.8126740  0.54000531
##   0      0.36905263  0.8016361  0.50914903
##   0      0.42163158  0.7950427  0.48786531
##   0      0.47421053  0.8016606  0.50030582
##   0      0.52678947  0.8038584  0.50490320
##   0      0.57936842  0.8016850  0.49697937
##   0      0.63194737  0.7862759  0.45208014
##   0      0.68452632  0.7862759  0.45018815
##   0      0.73710526  0.7796581  0.43020078
##   0      0.78968421  0.7796581  0.42825732
##   0      0.84226316  0.7708181  0.40042035
##   0      0.89484211  0.7708181  0.40042035
##   0      0.94742105  0.7642247  0.38002033
##   0      1.00000000  0.7576313  0.35950233
##   1      0.00100000  0.7333822  0.40247635
##   1      0.05357895  0.6564103  0.01945757
##   1      0.10615789  0.6585836  0.00000000
##   1      0.15873684  0.6585836  0.00000000
##   1      0.21131579  0.6585836  0.00000000
##   1      0.26389474  0.6585836  0.00000000
##   1      0.31647368  0.6585836  0.00000000
##   1      0.36905263  0.6585836  0.00000000
##   1      0.42163158  0.6585836  0.00000000
##   1      0.47421053  0.6585836  0.00000000
##   1      0.52678947  0.6585836  0.00000000
##   1      0.57936842  0.6585836  0.00000000
##   1      0.63194737  0.6585836  0.00000000
##   1      0.68452632  0.6585836  0.00000000
##   1      0.73710526  0.6585836  0.00000000
##   1      0.78968421  0.6585836  0.00000000
##   1      0.84226316  0.6585836  0.00000000
##   1      0.89484211  0.6585836  0.00000000
##   1      0.94742105  0.6585836  0.00000000
##   1      1.00000000  0.6585836  0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.05357895.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.728859
FeatEval_Freq_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Freq_mean_accuracy_cv_ENM1)
## [1] 0.728859
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")

train_accuracy <- mean(train_predictions == trainData_ENM1$DX)

FeatEval_Freq_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.997797356828194"
print(FeatEval_Freq_ENM1_trainAccuracy)
## [1] 0.9977974
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Freq_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Freq_ENM1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 124  17
##         CN   4  49
##                                           
##                Accuracy : 0.8918          
##                  95% CI : (0.8393, 0.9317)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 7.689e-14       
##                                           
##                   Kappa : 0.7468          
##                                           
##  Mcnemar's Test P-Value : 0.008829        
##                                           
##             Sensitivity : 0.9688          
##             Specificity : 0.7424          
##          Pos Pred Value : 0.8794          
##          Neg Pred Value : 0.9245          
##              Prevalence : 0.6598          
##          Detection Rate : 0.6392          
##    Detection Prevalence : 0.7268          
##       Balanced Accuracy : 0.8556          
##                                           
##        'Positive' Class : CI              
## 
cm_FeatEval_Freq_ENM1_Accuracy<-cm_FeatEval_Freq_ENM1$overall["Accuracy"]
cm_FeatEval_Freq_ENM1_Kappa<-cm_FeatEval_Freq_ENM1$overall["Kappa"]
print(cm_FeatEval_Freq_ENM1_Accuracy)
##  Accuracy 
## 0.8917526
print(cm_FeatEval_Freq_ENM1_Kappa)
##     Kappa 
## 0.7467993
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 271)
## 
##            Overall
## PC3         100.00
## PC2          99.27
## PC1          86.23
## cg23432430   64.21
## cg09727210   59.83
## cg07158503   57.69
## cg00962106   54.89
## cg06697310   51.23
## cg02225060   50.18
## cg09015880   47.62
## cg16338321   46.05
## cg26081710   43.78
## cg00819121   43.52
## cg00415024   42.70
## cg05064044   41.42
## cg01910713   41.39
## cg10701746   41.20
## cg27272246   40.77
## cg06277607   40.45
## cg02887598   40.15
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")

importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)

library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))

print(Ordered_importance_elastic_net_final_model1) 
  
}
##          Overall
## 1   1.8493115613
## 2   1.8358013771
## 3   1.5946672436
## 4   1.1876736663
## 5   1.1067238101
## 6   1.0670838066
## 7   1.0153638313
## 8   0.9477193404
## 9   0.9283402490
## 10  0.8809056124
## 11  0.8518929277
## 12  0.8098635645
## 13  0.8052137932
## 14  0.7899369844
## 15  0.7662250778
## 16  0.7658172477
## 17  0.7621766472
## 18  0.7543094710
## 19  0.7483591793
## 20  0.7428418527
## 21  0.7370550001
## 22  0.7345879704
## 23  0.7291891992
## 24  0.7237499559
## 25  0.7234664891
## 26  0.7176835313
## 27  0.7103256136
## 28  0.6933711824
## 29  0.6910379390
## 30  0.6909253276
## 31  0.6889075545
## 32  0.6778879684
## 33  0.6694058886
## 34  0.6495919138
## 35  0.6472677787
## 36  0.6466759647
## 37  0.6436230870
## 38  0.6419744634
## 39  0.6416811412
## 40  0.6401104316
## 41  0.6382530670
## 42  0.6375973534
## 43  0.6303285637
## 44  0.6298241867
## 45  0.6104356490
## 46  0.6004971240
## 47  0.6001497818
## 48  0.5884350622
## 49  0.5877142067
## 50  0.5866597569
## 51  0.5858906733
## 52  0.5853965581
## 53  0.5769541631
## 54  0.5766865973
## 55  0.5765756344
## 56  0.5713605652
## 57  0.5662492758
## 58  0.5649821706
## 59  0.5639555642
## 60  0.5563301612
## 61  0.5554272925
## 62  0.5496899748
## 63  0.5454664951
## 64  0.5448004913
## 65  0.5446274489
## 66  0.5387270092
## 67  0.5363102173
## 68  0.5362314921
## 69  0.5351129224
## 70  0.5325256170
## 71  0.5300768193
## 72  0.5291223498
## 73  0.5282280096
## 74  0.5271573438
## 75  0.5257567326
## 76  0.5229583408
## 77  0.5191119635
## 78  0.5154802945
## 79  0.5154027133
## 80  0.5144019458
## 81  0.5102932226
## 82  0.5090123941
## 83  0.5088852586
## 84  0.5065934678
## 85  0.5037571785
## 86  0.5023174669
## 87  0.5017157418
## 88  0.4971967579
## 89  0.4958806900
## 90  0.4946592979
## 91  0.4905155697
## 92  0.4896797836
## 93  0.4853499311
## 94  0.4817303965
## 95  0.4797668895
## 96  0.4790947038
## 97  0.4734622154
## 98  0.4710825245
## 99  0.4696588407
## 100 0.4669147028
## 101 0.4620626133
## 102 0.4610588959
## 103 0.4609080027
## 104 0.4598423587
## 105 0.4589796090
## 106 0.4504446076
## 107 0.4373777102
## 108 0.4370669809
## 109 0.4361943331
## 110 0.4335484239
## 111 0.4333827656
## 112 0.4296113833
## 113 0.4265132297
## 114 0.4258303767
## 115 0.4208723745
## 116 0.4208508164
## 117 0.4187083705
## 118 0.4162067601
## 119 0.4158067962
## 120 0.4151394270
## 121 0.4095282911
## 122 0.4069226179
## 123 0.4004364072
## 124 0.3982299986
## 125 0.3966540029
## 126 0.3949188399
## 127 0.3939919367
## 128 0.3896701395
## 129 0.3851262540
## 130 0.3834717302
## 131 0.3772182892
## 132 0.3762560920
## 133 0.3744889877
## 134 0.3728007636
## 135 0.3716097100
## 136 0.3715629978
## 137 0.3713296252
## 138 0.3702936431
## 139 0.3673695143
## 140 0.3642798962
## 141 0.3627276349
## 142 0.3614046908
## 143 0.3609346302
## 144 0.3580747433
## 145 0.3567424549
## 146 0.3555622616
## 147 0.3531851737
## 148 0.3512146612
## 149 0.3470954054
## 150 0.3463001278
## 151 0.3441222629
## 152 0.3385432894
## 153 0.3364968999
## 154 0.3356213656
## 155 0.3337241209
## 156 0.3327535424
## 157 0.3315367843
## 158 0.3308802498
## 159 0.3305939419
## 160 0.3302764633
## 161 0.3268513775
## 162 0.3192226054
## 163 0.3185161538
## 164 0.3160464282
## 165 0.3149973467
## 166 0.3141318614
## 167 0.3084220374
## 168 0.3077499088
## 169 0.3077282023
## 170 0.3073826389
## 171 0.3067509685
## 172 0.3062632827
## 173 0.3051063316
## 174 0.3040890204
## 175 0.3031317321
## 176 0.3013387831
## 177 0.3012498081
## 178 0.2999740727
## 179 0.2993524471
## 180 0.2966790175
## 181 0.2966382204
## 182 0.2948615357
## 183 0.2896771269
## 184 0.2893892721
## 185 0.2854971309
## 186 0.2837102254
## 187 0.2777232361
## 188 0.2757692073
## 189 0.2724632156
## 190 0.2716497354
## 191 0.2700906132
## 192 0.2694163701
## 193 0.2680488904
## 194 0.2667896485
## 195 0.2666732582
## 196 0.2629907604
## 197 0.2607211185
## 198 0.2588766294
## 199 0.2587502563
## 200 0.2526856587
## 201 0.2522339904
## 202 0.2506223932
## 203 0.2484234101
## 204 0.2469683571
## 205 0.2467512442
## 206 0.2465621628
## 207 0.2455025137
## 208 0.2451478928
## 209 0.2416462134
## 210 0.2408669474
## 211 0.2406027707
## 212 0.2399025198
## 213 0.2390230887
## 214 0.2380426677
## 215 0.2320773903
## 216 0.2296522868
## 217 0.2269593461
## 218 0.2267005655
## 219 0.2264591244
## 220 0.2245633017
## 221 0.2216640101
## 222 0.2213863565
## 223 0.2202808788
## 224 0.2187358387
## 225 0.2183393943
## 226 0.2181917400
## 227 0.2084412253
## 228 0.2069746386
## 229 0.2034632790
## 230 0.2022178798
## 231 0.2000299851
## 232 0.1977926363
## 233 0.1964109856
## 234 0.1913950354
## 235 0.1879510169
## 236 0.1857348268
## 237 0.1816455319
## 238 0.1779820954
## 239 0.1779750370
## 240 0.1769033380
## 241 0.1709653389
## 242 0.1700003097
## 243 0.1682468269
## 244 0.1672010662
## 245 0.1641445535
## 246 0.1632085082
## 247 0.1628849938
## 248 0.1619756944
## 249 0.1619512493
## 250 0.1581173278
## 251 0.1575501892
## 252 0.1524909997
## 253 0.1521293734
## 254 0.1482324149
## 255 0.1448325823
## 256 0.1332072552
## 257 0.1172551906
## 258 0.1133975009
## 259 0.1089556076
## 260 0.1053521773
## 261 0.0950868732
## 262 0.0813087633
## 263 0.0767632044
## 264 0.0552451974
## 265 0.0514541861
## 266 0.0443793791
## 267 0.0407841481
## 268 0.0181078631
## 269 0.0105839280
## 270 0.0027205719
## 271 0.0005666358
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
  importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_elastic_net_model1_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_elastic_net_model1_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_elastic_net_model1_df,n=20)$Feature)
  
  importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_elastic_net_model1_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_ENM1_AUC<-auc_value

  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_ENM1_AUC<-auc_value

  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")

  roc_curve <- roc(testData_ENM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_ENM1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_ENM1_AUC<-auc_value

  print(auc_value)  

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## Area under the curve: 0.9315

if (METHOD_FEATURE_FLAG ==1){
  prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_ENM1_AUC<-mean_auc
}
print(FeatEval_Freq_ENM1_AUC)
## Area under the curve: 0.9315

9.3.4. XGBoost

9.3.4.1 XGBoost Model Training

library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data 
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)

xgb_model <- caret::train(
  DX ~ ., data = trainData_XGB1,
  method = "xgbTree", trControl = cv_control,
  metric = "Accuracy"
)

print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 454 samples
## 271 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa       
##   0.3  1          0.6               0.50        50      0.6146276   0.054908838
##   0.3  1          0.6               0.50       100      0.6256166   0.081796805
##   0.3  1          0.6               0.50       150      0.6432723   0.126856863
##   0.3  1          0.6               0.75        50      0.6299145   0.076070615
##   0.3  1          0.6               0.75       100      0.6321368   0.091184457
##   0.3  1          0.6               0.75       150      0.6629792   0.174070786
##   0.3  1          0.6               1.00        50      0.6013187  -0.041548935
##   0.3  1          0.6               1.00       100      0.6254945   0.045271510
##   0.3  1          0.6               1.00       150      0.6254701   0.058088855
##   0.3  1          0.8               0.50        50      0.6410745   0.110650208
##   0.3  1          0.8               0.50       100      0.6696215   0.206447934
##   0.3  1          0.8               0.50       150      0.6675214   0.203828227
##   0.3  1          0.8               0.75        50      0.6014164  -0.018674793
##   0.3  1          0.8               0.75       100      0.6146764   0.066136854
##   0.3  1          0.8               0.75       150      0.6454701   0.138914454
##   0.3  1          0.8               1.00        50      0.5991209  -0.042358922
##   0.3  1          0.8               1.00       100      0.6166545   0.029709360
##   0.3  1          0.8               1.00       150      0.6211966   0.059065952
##   0.3  2          0.6               0.50        50      0.6917460   0.226621807
##   0.3  2          0.6               0.50       100      0.6916972   0.236360996
##   0.3  2          0.6               0.50       150      0.6938950   0.256663466
##   0.3  2          0.6               0.75        50      0.6321123   0.088794883
##   0.3  2          0.6               0.75       100      0.6629792   0.148837573
##   0.3  2          0.6               0.75       150      0.6850305   0.193276544
##   0.3  2          0.6               1.00        50      0.6476679   0.092450308
##   0.3  2          0.6               1.00       100      0.6344567   0.063108930
##   0.3  2          0.6               1.00       150      0.6431990   0.091487310
##   0.3  2          0.8               0.50        50      0.6475946   0.144508904
##   0.3  2          0.8               0.50       100      0.6563370   0.164804668
##   0.3  2          0.8               0.50       150      0.6695726   0.194322990
##   0.3  2          0.8               0.75        50      0.6410012   0.102926406
##   0.3  2          0.8               0.75       100      0.6674969   0.167633971
##   0.3  2          0.8               0.75       150      0.6675214   0.170698970
##   0.3  2          0.8               1.00        50      0.6344078   0.076389473
##   0.3  2          0.8               1.00       100      0.6410501   0.083690477
##   0.3  2          0.8               1.00       150      0.6543101   0.120746584
##   0.3  3          0.6               0.50        50      0.6827350   0.200728389
##   0.3  3          0.6               0.50       100      0.6959951   0.239172563
##   0.3  3          0.6               0.50       150      0.7136264   0.289917693
##   0.3  3          0.6               0.75        50      0.6365812   0.094015704
##   0.3  3          0.6               0.75       100      0.6520147   0.134724253
##   0.3  3          0.6               0.75       150      0.6674237   0.174092992
##   0.3  3          0.6               1.00        50      0.6277167   0.039385040
##   0.3  3          0.6               1.00       100      0.6344322   0.053748339
##   0.3  3          0.6               1.00       150      0.6322100   0.047571775
##   0.3  3          0.8               0.50        50      0.6740415   0.190759415
##   0.3  3          0.8               0.50       100      0.6828083   0.205652081
##   0.3  3          0.8               0.50       150      0.6916484   0.229551613
##   0.3  3          0.8               0.75        50      0.6410256   0.088331764
##   0.3  3          0.8               0.75       100      0.6563858   0.132888499
##   0.3  3          0.8               0.75       150      0.6519658   0.126876857
##   0.3  3          0.8               1.00        50      0.6388034   0.071728537
##   0.3  3          0.8               1.00       100      0.6542125   0.115431674
##   0.3  3          0.8               1.00       150      0.6542369   0.103789283
##   0.4  1          0.6               0.50        50      0.6388523   0.136508711
##   0.4  1          0.6               0.50       100      0.6762149   0.235915834
##   0.4  1          0.6               0.50       150      0.7004884   0.297921992
##   0.4  1          0.6               0.75        50      0.5879609   0.017619170
##   0.4  1          0.6               0.75       100      0.6410012   0.137929378
##   0.4  1          0.6               0.75       150      0.6499389   0.156140289
##   0.4  1          0.6               1.00        50      0.5990476  -0.019074926
##   0.4  1          0.6               1.00       100      0.6013431   0.004761008
##   0.4  1          0.6               1.00       150      0.6211722   0.064869505
##   0.4  1          0.8               0.50        50      0.6078632   0.075417232
##   0.4  1          0.8               0.50       100      0.6518926   0.164997842
##   0.4  1          0.8               0.50       150      0.6739194   0.223978821
##   0.4  1          0.8               0.75        50      0.6321856   0.098521482
##   0.4  1          0.8               0.75       100      0.6278144   0.103092505
##   0.4  1          0.8               0.75       150      0.6343834   0.119960199
##   0.4  1          0.8               1.00        50      0.5945543  -0.051606636
##   0.4  1          0.8               1.00       100      0.5990476  -0.004820168
##   0.4  1          0.8               1.00       150      0.6167766   0.067564974
##   0.4  2          0.6               0.50        50      0.6212943   0.109003764
##   0.4  2          0.6               0.50       100      0.6366300   0.136130595
##   0.4  2          0.6               0.50       150      0.6652259   0.189714886
##   0.4  2          0.6               0.75        50      0.6585104   0.157922161
##   0.4  2          0.6               0.75       100      0.6740171   0.192270104
##   0.4  2          0.6               0.75       150      0.6806349   0.212798146
##   0.4  2          0.6               1.00        50      0.6299634   0.074903123
##   0.4  2          0.6               1.00       100      0.6387790   0.085814383
##   0.4  2          0.6               1.00       150      0.6476190   0.112969503
##   0.4  2          0.8               0.50        50      0.6277656   0.113518254
##   0.4  2          0.8               0.50       100      0.6696703   0.193457486
##   0.4  2          0.8               0.50       150      0.6807082   0.228893323
##   0.4  2          0.8               0.75        50      0.6476435   0.134335094
##   0.4  2          0.8               0.75       100      0.6475946   0.128323611
##   0.4  2          0.8               0.75       150      0.6652015   0.182324605
##   0.4  2          0.8               1.00        50      0.6475946   0.113857078
##   0.4  2          0.8               1.00       100      0.6586325   0.137368428
##   0.4  2          0.8               1.00       150      0.6608303   0.149771016
##   0.4  3          0.6               0.50        50      0.6784615   0.222287479
##   0.4  3          0.6               0.50       100      0.6915995   0.248303704
##   0.4  3          0.6               0.50       150      0.6894505   0.240392779
##   0.4  3          0.6               0.75        50      0.6365324   0.108994896
##   0.4  3          0.6               0.75       100      0.6365568   0.102537766
##   0.4  3          0.6               0.75       150      0.6520147   0.141759767
##   0.4  3          0.6               1.00        50      0.6388034   0.053838745
##   0.4  3          0.6               1.00       100      0.6542125   0.105825845
##   0.4  3          0.6               1.00       150      0.6520391   0.098852081
##   0.4  3          0.8               0.50        50      0.6563858   0.144810525
##   0.4  3          0.8               0.50       100      0.6585348   0.143394171
##   0.4  3          0.8               0.50       150      0.6783394   0.203188245
##   0.4  3          0.8               0.75        50      0.6344078   0.115372746
##   0.4  3          0.8               0.75       100      0.6431746   0.128434039
##   0.4  3          0.8               0.75       150      0.6476435   0.128580660
##   0.4  3          0.8               1.00        50      0.6540904   0.128964574
##   0.4  3          0.8               1.00       100      0.6452991   0.094187912
##   0.4  3          0.8               1.00       150      0.6540904   0.119240164
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.3, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.648166
FeatEval_Freq_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Freq_mean_accuracy_cv_xgb)
## [1] 0.648166
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")

train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Freq_xgb_trainAccuracy <- train_accuracy

print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
print(FeatEval_Freq_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Freq_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Freq_xgb)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 113  46
##         CN  15  20
##                                           
##                Accuracy : 0.6856          
##                  95% CI : (0.6152, 0.7502)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 0.2490658       
##                                           
##                   Kappa : 0.2097          
##                                           
##  Mcnemar's Test P-Value : 0.0001225       
##                                           
##             Sensitivity : 0.8828          
##             Specificity : 0.3030          
##          Pos Pred Value : 0.7107          
##          Neg Pred Value : 0.5714          
##              Prevalence : 0.6598          
##          Detection Rate : 0.5825          
##    Detection Prevalence : 0.8196          
##       Balanced Accuracy : 0.5929          
##                                           
##        'Positive' Class : CI              
## 
cm_FeatEval_Freq_xgb_Accuracy <-cm_FeatEval_Freq_xgb$overall["Accuracy"]
cm_FeatEval_Freq_xgb_Kappa <-cm_FeatEval_Freq_xgb$overall["Kappa"]

print(cm_FeatEval_Freq_xgb_Accuracy)
## Accuracy 
## 0.685567
print(cm_FeatEval_Freq_xgb_Kappa)
##     Kappa 
## 0.2096968
importance_xgb_model<- varImp(xgb_model)

print(importance_xgb_model)
## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 271)
## 
##            Overall
## cg26983017  100.00
## cg07504457   96.11
## cg23916408   90.50
## cg11187460   82.69
## cg18285382   79.66
## cg11787167   78.36
## cg03749159   75.14
## cg05161773   72.63
## cg21697769   68.71
## cg15633912   67.82
## PC2          66.79
## cg06697310   63.20
## cg11331837   62.96
## cg15600437   62.33
## cg02823329   61.35
## cg05876883   60.52
## cg25436480   57.95
## cg16202259   56.92
## cg07158503   55.87
## cg19301366   54.30
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")

importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)

ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
##         Feature         Gain        Cover   Frequency   Importance
##          <char>        <num>        <num>       <num>        <num>
##   1: cg26983017 2.205161e-02 0.0115536297 0.005253940 2.205161e-02
##   2: cg07504457 2.119472e-02 0.0123073576 0.008756567 2.119472e-02
##   3: cg23916408 1.995766e-02 0.0178778555 0.008756567 1.995766e-02
##   4: cg11187460 1.823413e-02 0.0122026553 0.007005254 1.823413e-02
##   5: cg18285382 1.756551e-02 0.0136433644 0.005253940 1.756551e-02
##  ---                                                              
## 232: cg04497611 8.810252e-05 0.0004279974 0.001751313 8.810252e-05
## 233: cg21243064 8.531156e-05 0.0004553075 0.001751313 8.531156e-05
## 234: cg17723206 5.402373e-05 0.0006496913 0.001751313 5.402373e-05
## 235: cg20078646 1.750849e-05 0.0003917131 0.001751313 1.750849e-05
## 236: cg01910713 1.473237e-05 0.0004157667 0.001751313 1.473237e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_xgb_AUC <- auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_xgb_AUC <- auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")

  roc_curve <- roc(testData_XGB1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(testData_XGB1$DX)))
  auc_value <- roc_curve$auc
  FeatEval_Freq_xgb_AUC <- auc_value

  print(auc_value)  
  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.7293

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_xgb_AUC <- mean_auc
}
print(FeatEval_Freq_xgb_AUC)
## Area under the curve: 0.7293

9.3.5. Random Forest

9.3.5.1 Random Forest Model Training

library(caret)
library(randomForest)
df_RFM1<-processed_data 
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)

set.seed(123) 
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]

X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)

rf_model <- caret::train(
  DX ~ ., data = train_data_RFM1,
  method = "rf", trControl = ctrl,
  metric = "Accuracy",
  importance = TRUE
)

print(rf_model)
## Random Forest 
## 
## 454 samples
## 271 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##     2   0.6630037  0.01679579
##   136   0.6718437  0.06462992
##   271   0.6586081  0.04595409
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 136.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.6644851
FeatEval_Freq_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Freq_mean_accuracy_cv_rf)
## [1] 0.6644851
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")


train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  1"
FeatEval_Freq_rf_trainAccuracy<-train_accuracy
print(FeatEval_Freq_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Freq_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Freq_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 128  62
##         CN   0   4
##                                           
##                Accuracy : 0.6804          
##                  95% CI : (0.6098, 0.7454)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 0.3             
##                                           
##                   Kappa : 0.0785          
##                                           
##  Mcnemar's Test P-Value : 9.408e-15       
##                                           
##             Sensitivity : 1.00000         
##             Specificity : 0.06061         
##          Pos Pred Value : 0.67368         
##          Neg Pred Value : 1.00000         
##              Prevalence : 0.65979         
##          Detection Rate : 0.65979         
##    Detection Prevalence : 0.97938         
##       Balanced Accuracy : 0.53030         
##                                           
##        'Positive' Class : CI              
## 
cm_FeatEval_Freq_rf_Accuracy<-cm_FeatEval_Freq_rf$overall["Accuracy"]
print(cm_FeatEval_Freq_rf_Accuracy)
##  Accuracy 
## 0.6804124
cm_FeatEval_Freq_rf_Kappa<-cm_FeatEval_Freq_rf$overall["Kappa"]
print(cm_FeatEval_Freq_rf_Kappa)
##      Kappa 
## 0.07845541
importance_rf_model <- varImp(rf_model)

print(importance_rf_model)
## rf variable importance
## 
##   only 20 most important variables shown (out of 271)
## 
##            Importance
## cg21697769     100.00
## cg11331837      98.62
## cg03749159      88.25
## cg11133939      86.05
## cg01008088      85.80
## cg03982462      84.41
## cg05234269      82.48
## cg07138269      81.47
## cg00004073      81.43
## cg18857647      80.97
## cg11314779      78.30
## cg02887598      78.13
## cg01910713      77.48
## cg09120722      76.36
## cg09584650      75.14
## cg17268094      74.22
## cg25879395      73.47
## cg04888234      73.21
## cg04768387      72.82
## cg07158503      72.48
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")

importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
 
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))

print(Ordered_importance_rf_final_model)
  
}
if( METHOD_FEATURE_FLAG==4||METHOD_FEATURE_FLAG==6){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
 
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))

print(Ordered_importance_rf_final_model)
  
}
if( METHOD_FEATURE_FLAG==3){
  
importance_rf_final_model <- varImp(rf_model$finalModel)

library(dplyr)
 
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))

print(Ordered_importance_rf_final_model)
  
}
##                CI            CN
## 1    2.7875875467  2.7875875467
## 2    2.7222191735  2.7222191735
## 3    2.2322067610  2.2322067610
## 4    2.1280199497  2.1280199497
## 5    2.1161400764  2.1161400764
## 6    2.0505829780  2.0505829780
## 7    1.9590061962  1.9590061962
## 8    1.9116267791  1.9116267791
## 9    1.9094896734  1.9094896734
## 10   1.8879546622  1.8879546622
## 11   1.7616249927  1.7616249927
## 12   1.7535804913  1.7535804913
## 13   1.7226756514  1.7226756514
## 14   1.6699584311  1.6699584311
## 15   1.6120484124  1.6120484124
## 16   1.5687887056  1.5687887056
## 17   1.5331154656  1.5331154656
## 18   1.5207745177  1.5207745177
## 19   1.5024244818  1.5024244818
## 20   1.4865795375  1.4865795375
## 21   1.4622730635  1.4622730635
## 22   1.4583231854  1.4583231854
## 23   1.4497536779  1.4497536779
## 24   1.4472665331  1.4472665331
## 25   1.4455641078  1.4455641078
## 26   1.4197021378  1.4197021378
## 27   1.4183708437  1.4183708437
## 28   1.3704330366  1.3704330366
## 29   1.3630114523  1.3630114523
## 30   1.3467348848  1.3467348848
## 31   1.3184220360  1.3184220360
## 32   1.3165446664  1.3165446664
## 33   1.3067956999  1.3067956999
## 34   1.3053695749  1.3053695749
## 35   1.2380382363  1.2380382363
## 36   1.2293425574  1.2293425574
## 37   1.2224020892  1.2224020892
## 38   1.2169307639  1.2169307639
## 39   1.1097280115  1.1097280115
## 40   1.1034993781  1.1034993781
## 41   1.0815521836  1.0815521836
## 42   1.0496202332  1.0496202332
## 43   1.0096525877  1.0096525877
## 44   1.0022162870  1.0022162870
## 45   0.9946322009  0.9946322009
## 46   0.9833587919  0.9833587919
## 47   0.9652707883  0.9652707883
## 48   0.9566642502  0.9566642502
## 49   0.9299639748  0.9299639748
## 50   0.9204012473  0.9204012473
## 51   0.8908010399  0.8908010399
## 52   0.8812282057  0.8812282057
## 53   0.8805020278  0.8805020278
## 54   0.8644525651  0.8644525651
## 55   0.8534893219  0.8534893219
## 56   0.8483108996  0.8483108996
## 57   0.8461219800  0.8461219800
## 58   0.8311482726  0.8311482726
## 59   0.8001771234  0.8001771234
## 60   0.7941748621  0.7941748621
## 61   0.7815781359  0.7815781359
## 62   0.7777573344  0.7777573344
## 63   0.7434743876  0.7434743876
## 64   0.7198603440  0.7198603440
## 65   0.7193974412  0.7193974412
## 66   0.7171485930  0.7171485930
## 67   0.6959333531  0.6959333531
## 68   0.6864352331  0.6864352331
## 69   0.6825454126  0.6825454126
## 70   0.6813962648  0.6813962648
## 71   0.6797513875  0.6797513875
## 72   0.6783283310  0.6783283310
## 73   0.6720741547  0.6720741547
## 74   0.6694068787  0.6694068787
## 75   0.6624920927  0.6624920927
## 76   0.6547077039  0.6547077039
## 77   0.6522383813  0.6522383813
## 78   0.6337729728  0.6337729728
## 79   0.6297529954  0.6297529954
## 80   0.6282990770  0.6282990770
## 81   0.6241608824  0.6241608824
## 82   0.6147973472  0.6147973472
## 83   0.5955885197  0.5955885197
## 84   0.5917514705  0.5917514705
## 85   0.5895207014  0.5895207014
## 86   0.5885567084  0.5885567084
## 87   0.5780191999  0.5780191999
## 88   0.5588196647  0.5588196647
## 89   0.5566613917  0.5566613917
## 90   0.5375559383  0.5375559383
## 91   0.5364199053  0.5364199053
## 92   0.5313564802  0.5313564802
## 93   0.5297984673  0.5297984673
## 94   0.5241652261  0.5241652261
## 95   0.5225682560  0.5225682560
## 96   0.5062385751  0.5062385751
## 97   0.4932552137  0.4932552137
## 98   0.4932507867  0.4932507867
## 99   0.4837075315  0.4837075315
## 100  0.4822657106  0.4822657106
## 101  0.4818489830  0.4818489830
## 102  0.4519333921  0.4519333921
## 103  0.4312164931  0.4312164931
## 104  0.4151714978  0.4151714978
## 105  0.4029872749  0.4029872749
## 106  0.3947882634  0.3947882634
## 107  0.3777553026  0.3777553026
## 108  0.3597576445  0.3597576445
## 109  0.3472566892  0.3472566892
## 110  0.3467891474  0.3467891474
## 111  0.3414835417  0.3414835417
## 112  0.3324760106  0.3324760106
## 113  0.3152987043  0.3152987043
## 114  0.3023018684  0.3023018684
## 115  0.2883911587  0.2883911587
## 116  0.2800937721  0.2800937721
## 117  0.2787556157  0.2787556157
## 118  0.2688167952  0.2688167952
## 119  0.2679314095  0.2679314095
## 120  0.2475683074  0.2475683074
## 121  0.2462103094  0.2462103094
## 122  0.2439460309  0.2439460309
## 123  0.2356991189  0.2356991189
## 124  0.2187805412  0.2187805412
## 125  0.2173915680  0.2173915680
## 126  0.2092443171  0.2092443171
## 127  0.2084082463  0.2084082463
## 128  0.2029529373  0.2029529373
## 129  0.2003798803  0.2003798803
## 130  0.1900588742  0.1900588742
## 131  0.1862078969  0.1862078969
## 132  0.1654443717  0.1654443717
## 133  0.1617037263  0.1617037263
## 134  0.1531806048  0.1531806048
## 135  0.1423141757  0.1423141757
## 136  0.1363187452  0.1363187452
## 137  0.1232062894  0.1232062894
## 138  0.1104412248  0.1104412248
## 139  0.0884937539  0.0884937539
## 140  0.0839741379  0.0839741379
## 141  0.0804419256  0.0804419256
## 142  0.0785402700  0.0785402700
## 143  0.0726191100  0.0726191100
## 144  0.0645239661  0.0645239661
## 145  0.0506061632  0.0506061632
## 146  0.0458256131  0.0458256131
## 147  0.0295500822  0.0295500822
## 148  0.0074691077  0.0074691077
## 149 -0.0007011184 -0.0007011184
## 150 -0.0064416505 -0.0064416505
## 151 -0.0155381997 -0.0155381997
## 152 -0.0224435473 -0.0224435473
## 153 -0.0230627597 -0.0230627597
## 154 -0.0272048655 -0.0272048655
## 155 -0.0333093091 -0.0333093091
## 156 -0.0478401951 -0.0478401951
## 157 -0.0497759308 -0.0497759308
## 158 -0.0543289821 -0.0543289821
## 159 -0.0562290841 -0.0562290841
## 160 -0.0634126743 -0.0634126743
## 161 -0.0652337962 -0.0652337962
## 162 -0.0900412182 -0.0900412182
## 163 -0.1013843202 -0.1013843202
## 164 -0.1024504213 -0.1024504213
## 165 -0.1124549912 -0.1124549912
## 166 -0.1148361355 -0.1148361355
## 167 -0.1193318119 -0.1193318119
## 168 -0.1236664515 -0.1236664515
## 169 -0.1294409066 -0.1294409066
## 170 -0.1340565917 -0.1340565917
## 171 -0.1406496500 -0.1406496500
## 172 -0.1415512436 -0.1415512436
## 173 -0.1482721385 -0.1482721385
## 174 -0.1568179428 -0.1568179428
## 175 -0.1651867185 -0.1651867185
## 176 -0.1755212656 -0.1755212656
## 177 -0.1770466111 -0.1770466111
## 178 -0.1814507642 -0.1814507642
## 179 -0.1993313467 -0.1993313467
## 180 -0.2046496300 -0.2046496300
## 181 -0.2065001318 -0.2065001318
## 182 -0.2101802223 -0.2101802223
## 183 -0.2206178408 -0.2206178408
## 184 -0.2220606542 -0.2220606542
## 185 -0.2274809064 -0.2274809064
## 186 -0.2369917638 -0.2369917638
## 187 -0.2435812004 -0.2435812004
## 188 -0.2489134617 -0.2489134617
## 189 -0.2515380234 -0.2515380234
## 190 -0.2698387881 -0.2698387881
## 191 -0.2798640474 -0.2798640474
## 192 -0.2875745946 -0.2875745946
## 193 -0.2999117038 -0.2999117038
## 194 -0.3188356581 -0.3188356581
## 195 -0.3202366014 -0.3202366014
## 196 -0.3332083570 -0.3332083570
## 197 -0.3553729528 -0.3553729528
## 198 -0.3608986536 -0.3608986536
## 199 -0.3736483805 -0.3736483805
## 200 -0.3833792398 -0.3833792398
## 201 -0.3836541054 -0.3836541054
## 202 -0.3906954770 -0.3906954770
## 203 -0.3952586258 -0.3952586258
## 204 -0.4005372341 -0.4005372341
## 205 -0.4191896315 -0.4191896315
## 206 -0.4200863936 -0.4200863936
## 207 -0.4257586064 -0.4257586064
## 208 -0.4258943439 -0.4258943439
## 209 -0.4271015000 -0.4271015000
## 210 -0.4454602463 -0.4454602463
## 211 -0.4637763052 -0.4637763052
## 212 -0.4846128405 -0.4846128405
## 213 -0.4969050331 -0.4969050331
## 214 -0.5234514600 -0.5234514600
## 215 -0.5400280095 -0.5400280095
## 216 -0.5438365588 -0.5438365588
## 217 -0.5587734786 -0.5587734786
## 218 -0.5712055552 -0.5712055552
## 219 -0.5812270707 -0.5812270707
## 220 -0.6136262629 -0.6136262629
## 221 -0.6137377152 -0.6137377152
## 222 -0.6284929719 -0.6284929719
## 223 -0.6298347214 -0.6298347214
## 224 -0.6459461188 -0.6459461188
## 225 -0.6537186930 -0.6537186930
## 226 -0.6734727658 -0.6734727658
## 227 -0.6743991521 -0.6743991521
## 228 -0.6968637701 -0.6968637701
## 229 -0.6997265346 -0.6997265346
## 230 -0.7379594719 -0.7379594719
## 231 -0.7621201667 -0.7621201667
## 232 -0.7643717776 -0.7643717776
## 233 -0.7672932463 -0.7672932463
## 234 -0.7968403219 -0.7968403219
## 235 -0.8254412128 -0.8254412128
## 236 -0.8726435018 -0.8726435018
## 237 -0.8910936779 -0.8910936779
## 238 -0.9185024303 -0.9185024303
## 239 -0.9349863320 -0.9349863320
## 240 -0.9404324166 -0.9404324166
## 241 -0.9415220697 -0.9415220697
## 242 -0.9427086057 -0.9427086057
## 243 -0.9505149105 -0.9505149105
## 244 -0.9703707229 -0.9703707229
## 245 -0.9858251373 -0.9858251373
## 246 -1.0003229474 -1.0003229474
## 247 -1.0031500942 -1.0031500942
## 248 -1.0093379956 -1.0093379956
## 249 -1.0111287731 -1.0111287731
## 250 -1.0171877747 -1.0171877747
## 251 -1.0330678235 -1.0330678235
## 252 -1.0936076931 -1.0936076931
## 253 -1.1072980280 -1.1072980280
## 254 -1.1502807625 -1.1502807625
## 255 -1.2362346877 -1.2362346877
## 256 -1.2535737261 -1.2535737261
## 257 -1.2610492662 -1.2610492662
## 258 -1.2614545995 -1.2614545995
## 259 -1.3242804771 -1.3242804771
## 260 -1.3436752336 -1.3436752336
## 261 -1.3973838134 -1.3973838134
## 262 -1.4170959743 -1.4170959743
## 263 -1.4457616393 -1.4457616393
## 264 -1.5439461410 -1.5439461410
## 265 -1.5645077672 -1.5645077672
## 266 -1.5789871207 -1.5789871207
## 267 -1.6552044384 -1.6552044384
## 268 -1.6710730636 -1.6710730636
## 269 -1.6835838774 -1.6835838774
## 270 -1.7165556986 -1.7165556986
## 271 -1.9405152899 -1.9405152899
if(METHOD_FEATURE_FLAG==1){
  # for the multi classification case, 
  # for each feature, we will choose the maximum importance value
  # Add a column for the maximum importance
  importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
  importance_rf_model_df <- importance_rf_model_df %>%
    mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
    arrange(desc(MaxImportance))

  print(importance_rf_model_df)
  
}
if(METHOD_FEATURE_FLAG == 1){
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 1){
  
  print(importance_rf_model_df %>% head(20))
  print("the top 20 features based on max way:")
  print(head(importance_rf_model_df,n=20)$Feature)
  
  importance_melted_rf_model_df <- importance_rf_model_df %>%
    head(20)%>%
    dplyr::select(-MaxImportance) %>%
    melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")

  ggplot(importance_melted_rf_model_df, 
         aes(x = reorder(Feature, -Importance), 
             y = Importance, fill = Class)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Across Classes",
       x = "Feature",
       y = "Importance",
       fill = "Class") +
  theme_minimal()

}
if(METHOD_FEATURE_FLAG == 5){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_rf_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_rf_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")

  roc_curve <- roc(test_data_RFM1$DX,
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_RFM1$DX)))
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_rf_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## Area under the curve: 0.7182

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_rf_AUC<-mean_auc
}
print(FeatEval_Freq_rf_AUC)
## Area under the curve: 0.7182

9.3.6. SVM

9.3.6.1 SVM Model Training

df_SVM<-processed_data 
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]

X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)

svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
                   method = "svmRadial",
                   trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 454 samples
## 271 predictors
##   2 classes: 'CI', 'CN' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 363, 363, 363, 363, 364 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.8061538  0.5878466
##   0.50  0.8039316  0.5836392
##   1.00  0.8083272  0.5818052
## 
## Tuning parameter 'sigma' was held constant at a value of 0.001883387
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.001883387 and C = 1.
print(svm_model$bestTune)
##         sigma C
## 3 0.001883387 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8061376
FeatEval_Freq_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Freq_mean_accuracy_cv_svm)
## [1] 0.8061376
train_predictions <- predict(svm_model, newdata = train_data_SVM1)

train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy:  0.984581497797357"
FeatEval_Freq_svm_trainAccuracy <- train_accuracy
print(FeatEval_Freq_svm_trainAccuracy)
## [1] 0.9845815
predictions <- predict(svm_model, newdata = test_data_SVM1)

cm_FeatEval_Freq_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Freq_svm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  CI  CN
##         CI 113   5
##         CN  15  61
##                                           
##                Accuracy : 0.8969          
##                  95% CI : (0.8453, 0.9359)
##     No Information Rate : 0.6598          
##     P-Value [Acc > NIR] : 1.772e-14       
##                                           
##                   Kappa : 0.7785          
##                                           
##  Mcnemar's Test P-Value : 0.04417         
##                                           
##             Sensitivity : 0.8828          
##             Specificity : 0.9242          
##          Pos Pred Value : 0.9576          
##          Neg Pred Value : 0.8026          
##              Prevalence : 0.6598          
##          Detection Rate : 0.5825          
##    Detection Prevalence : 0.6082          
##       Balanced Accuracy : 0.9035          
##                                           
##        'Positive' Class : CI              
## 
cm_FeatEval_Freq_svm_Accuracy <- cm_FeatEval_Freq_svm$overall["Accuracy"]
cm_FeatEval_Freq_svm_Kappa <- cm_FeatEval_Freq_svm$overall["Kappa"]
print(cm_FeatEval_Freq_svm_Accuracy)
##  Accuracy 
## 0.8969072
print(cm_FeatEval_Freq_svm_Kappa)
##     Kappa 
## 0.7784882

Let’s take a look of the feature importance of the model trained.

library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method:  FeatureImp 
## error function: ce
## 
## Analysed predictor: 
## Prediction task: classification 
## Classes:  
## 
## Analysed data:
## Sampling from data.frame with 648 rows and 272 columns.
## 
## 
## Head of results:
##      feature importance.05 importance importance.95 permutation.error
## 1 cg22071943     1.1555556   1.222222      1.251852        0.05092593
## 2 cg09785377     1.0962963   1.222222      1.251852        0.05092593
## 3 cg09015880     0.9925926   1.185185      1.281481        0.04938272
## 4 cg21697769     1.1185185   1.185185      1.222222        0.04938272
## 5 cg02078724     1.1185185   1.185185      1.222222        0.04938272
## 6 cg25758034     1.1259259   1.185185      1.288889        0.04938272
plot(importance_SVM)

library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)

importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "MCI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_svm_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "Dementia"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_svm_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
if(METHOD_FEATURE_FLAG == 3){
  
  library(e1071)
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curve <- roc(test_data_SVM1$DX, 
                   prob_predictions[, "CI"], 
                   levels = rev(levels(test_data_SVM1$DX)))
  print(roc_curve)

  print("The auc vlue is:")
  auc_value <- roc_curve$auc

  print(auc_value) 
  FeatEval_Freq_svm_AUC<-auc_value

  plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
  
}
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[,     "CI"], levels = rev(levels(test_data_SVM1$DX)))
## 
## Data: prob_predictions[, "CI"] in 66 controls (test_data_SVM1$DX CN) < 128 cases (test_data_SVM1$DX CI).
## Area under the curve: 0.9632
## [1] "The auc vlue is:"
## Area under the curve: 0.9632

if (METHOD_FEATURE_FLAG == 1){
  prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
  roc_curves <- list()
  auc_values <- numeric()
  classes <- levels(testData$DX)
  
  for (class in classes) {
  binary_labels <- ifelse(testData$DX == class, 1, 0)
  roc_curve <- roc(binary_labels, prob_predictions[, class])
  roc_curves[[class]] <- roc_curve
  auc_values[class] <- roc_curve$auc
  }
  
  for (class in classes) {
  cat("Class:", class, "\n")
  print(roc_curves[[class]])
  cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
  }
  

  
  plot(roc_curves[[1]], col = "blue", 
       lwd = 2, 
       main = "One versus Rest - ROC Curve for Each Class")
  
  for (i in 2:length(classes)) {
  lines(roc_curves[[i]], col = i+1, lwd = 2)
  }
  legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)

   
}
if(METHOD_FEATURE_FLAG ==1){
    mean_auc <- mean(auc_values)
    cat("The mean AUC value across all classes with one versus rest method is:",
      mean_auc, "\n")
    FeatEval_Freq_svm_AUC<-mean_auc
}
print(FeatEval_Freq_svm_AUC)
## Area under the curve: 0.9632

10. Peromance Metrics

In the INPUT Session, “Metrics_Table_Output_FLAG” : This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics

Feature_and_model_Metrics <- c("Training Accuracy", "Test Accuracy", "Test Kappa", "AUC", "Average Test Accuracy during Cross Validation")

ModelTrain_stage_Logistic_metrics_ModelTrainStage <- c(modelTrain_LRM1_trainAccuracy, cm_modelTrain_LRM1_Accuracy, cm_modelTrain_LRM1_Kappa,modelTrain_LRM1_AUC, modelTrain_mean_accuracy_cv_LRM1) 

ModelTrain_stage_Logistic_metrics_Feature_Mean<-c(FeatEval_Mean_LRM1_trainAccuracy,
cm_FeatEval_Mean_LRM1_Accuracy,cm_FeatEval_Mean_LRM1_Kappa,FeatEval_Mean_LRM1_AUC, FeatEval_Mean_mean_accuracy_cv_LRM1)

ModelTrain_stage_Logistic_metrics_Feature_Median<-c(FeatEval_Median_LRM1_trainAccuracy,
cm_FeatEval_Median_LRM1_Accuracy,cm_FeatEval_Median_LRM1_Kappa,FeatEval_Median_LRM1_AUC, FeatEval_Median_mean_accuracy_cv_LRM1)

ModelTrain_stage_Logistic_metrics_Feature_Freq<-c(FeatEval_Freq_LRM1_trainAccuracy,
cm_FeatEval_Freq_LRM1_Accuracy,cm_FeatEval_Freq_LRM1_Kappa,FeatEval_Freq_LRM1_AUC,FeatEval_Freq_mean_accuracy_cv_LRM1)

ModelTrain_stage_Logistic_metrics<-c(ModelTrain_stage_Logistic_metrics_ModelTrainStage, ModelTrain_stage_Logistic_metrics_Feature_Mean,ModelTrain_stage_Logistic_metrics_Feature_Median,ModelTrain_stage_Logistic_metrics_Feature_Freq)
ModelTrain_stage_ElasticNet_metrics_ModelTrainStage <- c(modelTrain_ENM1_trainAccuracy, cm_modelTrain_ENM1_Accuracy, cm_modelTrain_ENM1_Kappa,modelTrain_ENM1_AUC, modelTrain_mean_accuracy_cv_ENM1) 

ModelTrain_stage_ElasticNet_metrics_Feature_Mean<-c(FeatEval_Mean_ENM1_trainAccuracy,
cm_FeatEval_Mean_ENM1_Accuracy,cm_FeatEval_Mean_ENM1_Kappa,FeatEval_Mean_ENM1_AUC, FeatEval_Mean_mean_accuracy_cv_ENM1)

ModelTrain_stage_ElasticNet_metrics_Feature_Median<-c(FeatEval_Median_ENM1_trainAccuracy,
cm_FeatEval_Median_ENM1_Accuracy,cm_FeatEval_Median_ENM1_Kappa,FeatEval_Median_ENM1_AUC, FeatEval_Median_mean_accuracy_cv_ENM1)

ModelTrain_stage_ElasticNet_metrics_Feature_Freq<-c(FeatEval_Freq_ENM1_trainAccuracy,
cm_FeatEval_Freq_ENM1_Accuracy,cm_FeatEval_Freq_ENM1_Kappa,FeatEval_Freq_ENM1_AUC,FeatEval_Freq_mean_accuracy_cv_ENM1)

ModelTrain_stage_ElasticNet_metrics<-c(ModelTrain_stage_ElasticNet_metrics_ModelTrainStage, ModelTrain_stage_ElasticNet_metrics_Feature_Mean,ModelTrain_stage_ElasticNet_metrics_Feature_Median,ModelTrain_stage_ElasticNet_metrics_Feature_Freq)
ModelTrain_stage_XGBoost_metrics_ModelTrainStage <- c(modelTrain_xgb_trainAccuracy, cm_modelTrain_xgb_Accuracy, cm_modelTrain_xgb_Kappa,modelTrain_xgb_AUC, modelTrain_mean_accuracy_cv_xgb) 

ModelTrain_stage_XGBoost_metrics_Feature_Mean<-c(FeatEval_Mean_xgb_trainAccuracy,
cm_FeatEval_Mean_xgb_Accuracy,cm_FeatEval_Mean_xgb_Kappa,FeatEval_Mean_xgb_AUC, FeatEval_Mean_mean_accuracy_cv_xgb)

ModelTrain_stage_XGBoost_metrics_Feature_Median<-c(FeatEval_Median_xgb_trainAccuracy,
cm_FeatEval_Median_xgb_Accuracy,cm_FeatEval_Median_xgb_Kappa,FeatEval_Median_xgb_AUC, FeatEval_Median_mean_accuracy_cv_xgb)

ModelTrain_stage_XGBoost_metrics_Feature_Freq<-c(FeatEval_Freq_xgb_trainAccuracy,
cm_FeatEval_Freq_xgb_Accuracy,cm_FeatEval_Freq_xgb_Kappa,FeatEval_Freq_xgb_AUC,FeatEval_Freq_mean_accuracy_cv_xgb)

ModelTrain_stage_XGBoost_metrics<-c(ModelTrain_stage_XGBoost_metrics_ModelTrainStage, ModelTrain_stage_XGBoost_metrics_Feature_Mean,ModelTrain_stage_XGBoost_metrics_Feature_Median,ModelTrain_stage_XGBoost_metrics_Feature_Freq)
ModelTrain_stage_RandomForest_metrics_ModelTrainStage <- c(modelTrain_rf_trainAccuracy, cm_modelTrain_rf_Accuracy, cm_modelTrain_rf_Kappa,modelTrain_rf_AUC, modelTrain_mean_accuracy_cv_rf) 

ModelTrain_stage_RandomForest_metrics_Feature_Mean<-c(FeatEval_Mean_rf_trainAccuracy,
cm_FeatEval_Mean_rf_Accuracy,cm_FeatEval_Mean_rf_Kappa,FeatEval_Mean_rf_AUC, FeatEval_Mean_mean_accuracy_cv_rf)

ModelTrain_stage_RandomForest_metrics_Feature_Median<-c(FeatEval_Median_rf_trainAccuracy,
cm_FeatEval_Median_rf_Accuracy,cm_FeatEval_Median_rf_Kappa,FeatEval_Median_rf_AUC, FeatEval_Median_mean_accuracy_cv_rf)

ModelTrain_stage_RandomForest_metrics_Feature_Freq<-c(FeatEval_Freq_rf_trainAccuracy,
cm_FeatEval_Freq_rf_Accuracy,cm_FeatEval_Freq_rf_Kappa,FeatEval_Freq_rf_AUC,FeatEval_Freq_mean_accuracy_cv_rf)

ModelTrain_stage_RandomForest_metrics<-c(ModelTrain_stage_RandomForest_metrics_ModelTrainStage, ModelTrain_stage_RandomForest_metrics_Feature_Mean,ModelTrain_stage_RandomForest_metrics_Feature_Median,ModelTrain_stage_RandomForest_metrics_Feature_Freq)
ModelTrain_stage_SVM_metrics_ModelTrainStage <- c(modelTrain_svm_trainAccuracy, cm_modelTrain_svm_Accuracy, cm_modelTrain_svm_Kappa,modelTrain_svm_AUC, modelTrain_mean_accuracy_cv_svm) 

ModelTrain_stage_SVM_metrics_Feature_Mean<-c(FeatEval_Mean_svm_trainAccuracy,
cm_FeatEval_Mean_svm_Accuracy,cm_FeatEval_Mean_svm_Kappa,FeatEval_Mean_svm_AUC, FeatEval_Mean_mean_accuracy_cv_svm)

ModelTrain_stage_SVM_metrics_Feature_Median<-c(FeatEval_Median_svm_trainAccuracy,
cm_FeatEval_Median_svm_Accuracy,cm_FeatEval_Median_svm_Kappa,FeatEval_Median_svm_AUC, FeatEval_Median_mean_accuracy_cv_svm)

ModelTrain_stage_SVM_metrics_Feature_Freq<-c(FeatEval_Freq_svm_trainAccuracy,
cm_FeatEval_Freq_svm_Accuracy,cm_FeatEval_Freq_svm_Kappa,FeatEval_Freq_svm_AUC,FeatEval_Freq_mean_accuracy_cv_svm)

ModelTrain_stage_SVM_metrics<-c(ModelTrain_stage_SVM_metrics_ModelTrainStage, ModelTrain_stage_SVM_metrics_Feature_Mean,ModelTrain_stage_SVM_metrics_Feature_Median,ModelTrain_stage_SVM_metrics_Feature_Freq)
if(METHOD_FEATURE_FLAG==1){
  classifcationType = "Multiclass"
}
if(METHOD_FEATURE_FLAG==2){
  classifcationType = "Multiclass and use PCA"
}
if(METHOD_FEATURE_FLAG==3){
  classifcationType = "Binary"
}
if(METHOD_FEATURE_FLAG==4){
  classifcationType = "CN vs Dementia (AD)"
}
if(METHOD_FEATURE_FLAG==5){
  classifcationType = "CN vs MCI"
}
if(METHOD_FEATURE_FLAG==6){
  classifcationType = "MCI vs Dementia"
}
Metrics_results_df <- data.frame()

library(dplyr)

Metrics_results_df <- data.frame(
  `Number_of_CpG_used` = rep(Number_N_TopNCpGs, 20),
  `Number_of_Phenotype_Features_Used` = rep(5, 20),
  `Total_Number_of_features_before_Preprocessing` = rep(Number_N_TopNCpGs+5, 20),
  `Number_of_features_after_processing` = rep(Num_feaForProcess, 20),
  `Classification_Type` = rep(classifcationType, 20),
  `Number_of_Key_features_Selected_(Mean,Median)` = rep(INPUT_NUMBER_FEATURES, 20),
  `Number_of_Key_features_remained_based_on_frequency_methods` = rep(Num_KeyFea_Frequency, 20),
  `Metrics_Stage` = c(rep("Model Train Stage",5),rep("Key Feature Evaluation (Select based on Mean) ",5),rep("Key Feature Evaluation (Select based on Median) ",5),rep("Key Feature Evaluation (Select based on Frequency) ",5)),
  `Metric` = rep(Feature_and_model_Metrics, 4),
  `Logistic_regression` = c(ModelTrain_stage_Logistic_metrics),
  `Elastic_Net` = c(ModelTrain_stage_ElasticNet_metrics),
  `XGBoost` = c(ModelTrain_stage_XGBoost_metrics),
  `Random_Forest` = c(ModelTrain_stage_RandomForest_metrics),
  `SVM` = c(ModelTrain_stage_SVM_metrics)
)


print(Metrics_results_df)
##    Number_of_CpG_used Number_of_Phenotype_Features_Used Total_Number_of_features_before_Preprocessing Number_of_features_after_processing Classification_Type
## 1                5000                                 5                                          5005                                 313              Binary
## 2                5000                                 5                                          5005                                 313              Binary
## 3                5000                                 5                                          5005                                 313              Binary
## 4                5000                                 5                                          5005                                 313              Binary
## 5                5000                                 5                                          5005                                 313              Binary
## 6                5000                                 5                                          5005                                 313              Binary
## 7                5000                                 5                                          5005                                 313              Binary
## 8                5000                                 5                                          5005                                 313              Binary
## 9                5000                                 5                                          5005                                 313              Binary
## 10               5000                                 5                                          5005                                 313              Binary
## 11               5000                                 5                                          5005                                 313              Binary
## 12               5000                                 5                                          5005                                 313              Binary
## 13               5000                                 5                                          5005                                 313              Binary
## 14               5000                                 5                                          5005                                 313              Binary
## 15               5000                                 5                                          5005                                 313              Binary
## 16               5000                                 5                                          5005                                 313              Binary
## 17               5000                                 5                                          5005                                 313              Binary
## 18               5000                                 5                                          5005                                 313              Binary
## 19               5000                                 5                                          5005                                 313              Binary
## 20               5000                                 5                                          5005                                 313              Binary
##    Number_of_Key_features_Selected_.Mean.Median. Number_of_Key_features_remained_based_on_frequency_methods                                       Metrics_Stage
## 1                                            250                                                        271                                   Model Train Stage
## 2                                            250                                                        271                                   Model Train Stage
## 3                                            250                                                        271                                   Model Train Stage
## 4                                            250                                                        271                                   Model Train Stage
## 5                                            250                                                        271                                   Model Train Stage
## 6                                            250                                                        271      Key Feature Evaluation (Select based on Mean) 
## 7                                            250                                                        271      Key Feature Evaluation (Select based on Mean) 
## 8                                            250                                                        271      Key Feature Evaluation (Select based on Mean) 
## 9                                            250                                                        271      Key Feature Evaluation (Select based on Mean) 
## 10                                           250                                                        271      Key Feature Evaluation (Select based on Mean) 
## 11                                           250                                                        271    Key Feature Evaluation (Select based on Median) 
## 12                                           250                                                        271    Key Feature Evaluation (Select based on Median) 
## 13                                           250                                                        271    Key Feature Evaluation (Select based on Median) 
## 14                                           250                                                        271    Key Feature Evaluation (Select based on Median) 
## 15                                           250                                                        271    Key Feature Evaluation (Select based on Median) 
## 16                                           250                                                        271 Key Feature Evaluation (Select based on Frequency) 
## 17                                           250                                                        271 Key Feature Evaluation (Select based on Frequency) 
## 18                                           250                                                        271 Key Feature Evaluation (Select based on Frequency) 
## 19                                           250                                                        271 Key Feature Evaluation (Select based on Frequency) 
## 20                                           250                                                        271 Key Feature Evaluation (Select based on Frequency) 
##                                           Metric Logistic_regression Elastic_Net   XGBoost Random_Forest       SVM
## 1                              Training Accuracy           0.9977974   0.9735683 1.0000000  1.0000000000 0.9911894
## 2                                  Test Accuracy           0.8762887   0.8969072 0.7525773  0.6701030928 0.8659794
## 3                                     Test Kappa           0.7095084   0.7560362 0.3755365  0.0487281643 0.7057863
## 4                                            AUC           0.9148911   0.9457860 0.7602983  0.6900449811 0.9156013
## 5  Average Test Accuracy during Cross Validation           0.7295157   0.7226380 0.6381592  0.6637606838 0.8296215
## 6                              Training Accuracy           0.9977974   0.9977974 1.0000000  1.0000000000 0.9801762
## 7                                  Test Accuracy           0.8556701   0.9020619 0.6907216  0.6701030928 0.8505155
## 8                                     Test Kappa           0.6636949   0.7709136 0.2321900  0.0664661654 0.6730210
## 9                                            AUC           0.8995028   0.9157197 0.7398201  0.7015861742 0.9421165
## 10 Average Test Accuracy during Cross Validation           0.7483299   0.7279298 0.6529530  0.6659584860 0.8259829
## 11                             Training Accuracy           0.9977974   0.9977974 1.0000000  1.0000000000 0.9845815
## 12                                 Test Accuracy           0.8711340   0.8917526 0.6701031  0.6546391753 0.8195876
## 13                                    Test Kappa           0.7031460   0.7487357 0.1810026 -0.0006158584 0.6109774
## 14                                           AUC           0.9092093   0.9243608 0.7262074  0.7142518939 0.9070786
## 15 Average Test Accuracy during Cross Validation           0.7681699   0.7355177 0.6460783  0.6622629223 0.8457957
## 16                             Training Accuracy           1.0000000   0.9977974 1.0000000  1.0000000000 0.9845815
## 17                                 Test Accuracy           0.8608247   0.8917526 0.6855670  0.6804123711 0.8969072
## 18                                    Test Kappa           0.6818127   0.7467993 0.2096968  0.0784554091 0.7784882
## 19                                           AUC           0.9074337   0.9314631 0.7292850  0.7181581439 0.9631866
## 20 Average Test Accuracy during Cross Validation           0.7461349   0.7288590 0.6481660  0.6644851445 0.8061376

Write out the data frame (Model Metrics) to csv file if FLAG_WRITE_METRICS_DF = TRUE

if(FLAG_WRITE_METRICS_DF){
  write.csv(Metrics_results_df,OUTUT_PerformanceMetricsCSV_PATHNAME,row.names = FALSE)
  print("Metrics Performance output path:")
  print(OUTUT_PerformanceMetricsCSV_PATHNAME)
}
## [1] "Metrics Performance output path:"
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method3_BinaryClass_CN_vs_CI\\Method3_BinaryClass_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"

Appendix - Variables

Overview of the Data Frame Variables.

  • Phenotype Part Data frame : “phenoticPart_RAW

  • RAW Merged Data frame : “merged_df_raw

  • Processed Data, i.e data used for model train.

    • name for “processed_data” could be :

      • processed_data_m1”, which uses method one to process the data

      • processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.

      • processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.

        Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names, and will assigned to “processed_dataFrame”.

      • processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.

      • processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.

      • processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.

    • name for “AfterProcess_FeatureName” could be :

      • AfterProcess_FeatureName_m1”, which is column name of processed dataframe with method one.
      • AfterProcess_FeatureName_m2”, which is column name of principle component method.
      • AfterProcess_FeatureName_m3”, which is column name of processed dataframe with method three This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
      • AfterProcess_FeatureName_m4”, which is column name of processed dataframe with method four. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
      • AfterProcess_FeatureName_m5”, which is column name of processed dataframe with method five This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
      • AfterProcess_FeatureName_m6”, which is column name of processed dataframe with method six. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
  • Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles

  • Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered

  • Feature Frequency / Common Data Frame:

    • frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “INPUT_NUMBER_FEATURES

    • feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.

    • all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.

  • Output data frame with selected features based on mean method: “df_selected_Mean

    , This data frame not have column named “SampleID”.

    • And the Feature names: “selected_impAvg_ordered_NAME
  • Output data frame with selected features based on median method: “df_selected_Median”, This data frame not have column named “SampleID”.

    • And the Feature names: “Selected_median_imp_Name
  • Output data frame with selected features based on frequency / common feature method: “df_process_Output_freq”, This data frame not have column named “SampleID”.

    • And the Feature names: “df_process_frequency_FeatureName

    • df_feature_Output_frequency” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “NUM_COMMON_FEATURES_SET_Frequency”

    • Selected_Frequency_Feature_importance” This is importance value of selected features’ frequency ordered by Total count of frequency

    • feature_output_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.

    • all_Output_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.

Overview of the Metrics Variables.

  • Number of CpG used: “Number_N_TopNCpGs

  • Phenotype features selected:

    • Multi: “age.now”,“PTGENDER”, “PC1”,“PC2”,“PC3” (Total number: 5)
    • Binary: “age.now”,“PTGENDER”,“PC1”,“PC2”,“PC3” (Total number: 5)
  • Number of features before processing: (#Phenotype features selected) + (#CpGs Used)

  • Number of features after processing (DMP, data cleaning):“Num_feaForProcess

  • Model performance (Variable names)- Model Training Stage:

    • Model Performance
      Initial Model Training Metric Logistic regression Elastic Net XGBoost Random Forest SVM
      Training Accuracy modelTrain_LRM1_trainAccuracy modelTrain_ENM1_trainAccuracy modelTrain_xgb_trainAccuracy modelTrain_rf_trainAccuracy modelTrain_svm_trainAccuracy
      Test Accuracy cm_modelTrain_LRM1_Accuracy cm_modelTrain_ENM1_Accuracy cm_modelTrain_xgb_Accuracy cm_modelTrain_rf_Accuracy cm_modelTrain_svm_Accuracy
      Test Kappa cm_modelTrain_LRM1_Kappa cm_modelTrain_ENM1_Kappa cm_modelTrain_xgb_Kappa cm_modelTrain_rf_Kappa cm_modelTrain_svm_Kappa
      AUC (for multi class, use mean AUC , and use one vs rest method) modelTrain_LRM1_AUC modelTrain_ENM1_AUC modelTrain_xgb_AUC modelTrain_rf_AUC modelTrain_svm_AUC
      Average Test Accuracy during Cross Validation modelTrain_mean_accuracy_cv_LRM1 modelTrain_mean_accuracy_cv_ENM1 modelTrain_mean_accuracy_cv_xgb modelTrain_mean_accuracy_cv_rf modelTrain_mean_accuracy_cv_svm
  • Number of Key features selected (Mean/Median Methods) : “INPUT_NUMBER_FEATURES

  • Number of Key features remained based on frequency methods : “Num_KeyFea_Frequency

  • Performance of the set of key features (Selected under 3 methods):

    Based on Mean:

    Based on Mean
    Key Features Performance Selected based on Mean Logistic Regression Elastic Net XGBoost Random Forest SVM
    Training Accuracy FeatEval_Mean_LRM1_trainAccuracy FeatEval_Mean_ENM1_trainAccuracy FeatEval_Mean_xgb_trainAccuracy FeatEval_Mean_rf_trainAccuracy FeatEval_Mean_svm_trainAccuracy
    Test Accuracy cm_FeatEval_Mean_LRM1_Accuracy cm_FeatEval_Mean_ENM1_Accuracy cm_FeatEval_Mean_xgb_Accuracy cm_FeatEval_Mean_rf_Accuracy cm_FeatEval_Mean_svm_Accuracy
    Test Kappa cm_FeatEval_Mean_LRM1_Kappa cm_FeatEval_Mean_ENM1_Kappa cm_FeatEval_Mean_xgb_Kappa cm_FeatEval_Mean_rf_Kappa cm_FeatEval_Mean_svm_Kappa
    AUC (for multi class, use mean AUC , and use one vs rest method) FeatEval_Mean_LRM1_AUC FeatEval_Mean_ENM1_AUC FeatEval_Mean_xgb_AUC FeatEval_Mean_rf_AUC FeatEval_Mean_svm_AUC
    Average Test Accuracy during Cross Validation FeatEval_Mean_mean_accuracy_cv_LRM1 FeatEval_Mean_mean_accuracy_cv_ENM1 FeatEval_Mean_mean_accuracy_cv_xgb FeatEval_Mean_mean_accuracy_cv_rf FeatEval_Mean_mean_accuracy_cv_svm

    Based on Median:

    Based on Median
    Key Features Performance Selected based on Mean Logistic Regression Elastic Net XGBoost Random Forest SVM
    Training Accuracy FeatEval_Median_LRM1_trainAccuracy FeatEval_Median_ENM1_trainAccuracy FeatEval_Median_xgb_trainAccuracy FeatEval_Median_rf_trainAccuracy FeatEval_Median_svm_trainAccuracy
    Test Accuracy cm_FeatEval_Median_LRM1_Accuracy cm_FeatEval_Median_ENM1_Accuracy cm_FeatEval_Median_xgb_Accuracy cm_FeatEval_Median_rf_Accuracy cm_FeatEval_Median_svm_Accuracy
    Test Kappa cm_FeatEval_Median_LRM1_Kappa cm_FeatEval_Median_ENM1_Kappa cm_FeatEval_Median_xgb_Kappa cm_FeatEval_Median_rf_Kappa cm_FeatEval_Median_svm_Kappa
    AUC (for multi class, use mean AUC , and use one vs rest method) FeatEval_Median_LRM1_AUC FeatEval_Median_ENM1_AUC FeatEval_Median_xgb_AUC FeatEval_Median_rf_AUC FeatEval_Median_svm_AUC
    Average Test Accuracy during Cross Validation FeatEval_Median_mean_accuracy_cv_LRM1 FeatEval_Median_mean_accuracy_cv_ENM1 FeatEval_Median_mean_accuracy_cv_xgb FeatEval_Median_mean_accuracy_cv_rf FeatEval_Median_mean_accuracy_cv_svm

    Based on Frequency:

    Based on Frequency
    Key Features Performance Selected based on Mean Logistic Regression Elastic Net XGBoost Random Forest SVM
    Training Accuracy FeatEval_Freq_LRM1_trainAccuracy FeatEval_Freq_ENM1_trainAccuracy FeatEval_Freq_xgb_trainAccuracy FeatEval_Freq_rf_trainAccuracy FeatEval_Freq_svm_trainAccuracy
    Test Accuracy cm_FeatEval_Freq_LRM1_Accuracy cm_FeatEval_Freq_ENM1_Accuracy cm_FeatEval_Freq_xgb_Accuracy cm_FeatEval_Freq_rf_Accuracy cm_FeatEval_Freq_svm_Accuracy
    Test Kappa cm_FeatEval_Freq_LRM1_Kappa cm_FeatEval_Freq_ENM1_Kappa cm_FeatEval_Freq_xgb_Kappa cm_FeatEval_Freq_rf_Kappa cm_FeatEval_Freq_svm_Kappa
    AUC (for multi class, use mean AUC , and use one vs rest method) FeatEval_Freq_LRM1_AUC FeatEval_Freq_ENM1_AUC FeatEval_Freq_xgb_AUC FeatEval_Freq_rf_AUC FeatEval_Freq_svm_AUC
    Average Test Accuracy during Cross Validation FeatEval_Freq_mean_accuracy_cv_LRM1 FeatEval_Freq_mean_accuracy_cv_ENM1 FeatEval_Freq_mean_accuracy_cv_xgb FeatEval_Freq_mean_accuracy_cv_rf FeatEval_Freq_mean_accuracy_cv_svm